Coder Perfect

Exists 1 or Exists * as a subquery

Problem

My EXISTS checks used to look like this:

IF EXISTS (SELECT * FROM TABLE WHERE Columns=@Filters)
BEGIN
   UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=@Filters
END

In a previous life, one of the DBAs advised me to use SELECT 1 instead of SELECT * when using an EXISTS clause.

IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=@Filters)
BEGIN
   UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=@Filters
END

Is there a real difference here?

Asked by Raj More

Solution #1

No, SQL Server is sophisticated enough to recognize that it is being used for an EXISTS and responds with NO DATA.

Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

To check yourself, try running the following:

SELECT whatever
  FROM yourtable
 WHERE EXISTS( SELECT 1/0
                 FROM someothertable 
                WHERE a_valid_clause )

It would throw a div by zero error if it was actually doing something with the SELECT list. It isn’t the case.

EDIT: It’s worth noting that this is addressed in the SQL Standard.

http://www.contrib.andrew.cmu.edu/shadow/sql/sql1992.txt ANSI SQL 1992 Standard, pg 191

Answered by Matt Rogish

Solution #2

This misunderstanding is most likely due to the assumption that it will read all of the columns. It’s obvious that this isn’t the case.

CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)

CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)

IF EXISTS (SELECT * FROM T)
    PRINT 'Y'

Gives plan

Despite the fact that the index does not include all columns, SQL Server was able to check the result using the narrowest index available. The index is accessed using a semi join operator, which implies it can stop searching after the first row has been returned.

As a result, it is evident that the preceding belief is incorrect.

However, Conor Cunningham of the Query Optimiser team explains that he often chooses SELECT 1 in this scenario because it can improve query compilation performance slightly.

On an empty table with variable numbers of columns, I tested four different ways of presenting this query. SELECT 1 vs SELECT * vs SELECT Primary Key vs SELECT Other Not Null Column vs SELECT Other Not Null Column vs SELECT Other Not Null Column vs SELECT Other Not Null_

I used OPTION (RECOMPILE) to run the queries in a loop and counted the average number of executions per second. The following are the outcomes.

+-------------+----------+---------+---------+--------------+
| Num of Cols |    *     |    1    |   PK    | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2           | 2043.5   | 2043.25 | 2073.5  | 2067.5       |
| 4           | 2038.75  | 2041.25 | 2067.5  | 2067.5       |
| 8           | 2015.75  | 2017    | 2059.75 | 2059         |
| 16          | 2005.75  | 2005.25 | 2025.25 | 2035.75      |
| 32          | 1963.25  | 1967.25 | 2001.25 | 1992.75      |
| 64          | 1903     | 1904    | 1936.25 | 1939.75      |
| 128         | 1778.75  | 1779.75 | 1799    | 1806.75      |
| 256         | 1530.75  | 1526.5  | 1542.75 | 1541.25      |
| 512         | 1195     | 1189.75 | 1203.75 | 1198.5       |
| 1024        | 694.75   | 697     | 699     | 699.25       |
+-------------+----------+---------+---------+--------------+
| Total       | 17169.25 | 17171   | 17408   | 17408        |
+-------------+----------+---------+---------+--------------+

As can be shown, neither SELECT 1 nor SELECT * consistently win, and the difference between the two ways is insignificant. However, the SELECT Not Null col and SELECT PK appear to be slightly faster.

All four of the queries degrade in performance as the number of columns in the table increases.

Due to the fact that the table is empty, this relationship appears to be explained solely by the amount of column metadata. From the below, it is clear that COUNT(1) is rewritten to COUNT(*) at some point during the process.

SET SHOWPLAN_TEXT ON;

GO

SELECT COUNT(1)
FROM master..spt_values

As a result, the following plan emerges.

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
            |--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))

Connecting a debugger to the SQL Server process and randomly crashing it while doing the following commands

DECLARE @V int 

WHILE (1=1)
    SELECT @V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)

Even when SELECT 1 is used, I found that in most cases where the table has 1,024 columns, the call stack looks like this, indicating that it is spending a significant amount of time loading column metadata (for the case where the table has 1 column, randomly breaking didn’t hit this bit of the call stack in 10 attempts).

sqlservr.exe!CMEDAccess::GetProxyBaseIntnl()  - 0x1e2c79 bytes  
sqlservr.exe!CMEDProxyRelation::GetColumn()  + 0x57 bytes   
sqlservr.exe!CAlgTableMetadata::LoadColumns()  + 0x256 bytes    
sqlservr.exe!CAlgTableMetadata::Bind()  + 0x15c bytes   
sqlservr.exe!CRelOp_Get::BindTree()  + 0x98 bytes   
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_FromList::BindTree()  + 0x5c bytes  
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_QuerySpec::BindTree()  + 0xbe bytes 
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CScaOp_Exists::BindScalarTree()  + 0x72 bytes  
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888)  Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart@8()  + 0x37 bytes 

This human profiling effort is supported by the VS 2012 code profiler, which displays a considerably distinct set of routines requiring compilation time in the two scenarios (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).

Both the SELECT 1 and SELECT * versions check column permissions and fail if the user does not have access to all of the table’s columns.

An example I cribbed from a conversation on the heap

CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO

GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO

SELECT 1
WHERE  EXISTS (SELECT 1
               FROM   T); 
/*  ↑↑↑↑ 
Fails unexpectedly with 

The SELECT permission was denied on the column 'Z' of the 
           object 'T', database 'tempdb', schema 'dbo'.*/

GO
REVERT;
DROP USER blat
DROP TABLE T

So it’s possible that the only difference between SELECT some not null col and SELECT some not null col is that it just checks permissions on that one column (though still loads the metadata for all). However, the data appear to contradict this, as the percentage difference between the two ways shrinks as the number of columns in the underlying table grows.

In any case, I’m not going to hurry out and convert all of my queries to this format because the difference is minimal and only noticeable during query compilation. The following result was obtained by removing the OPTION (RECOMPILE) so that subsequent executions might use a cached plan.

+-------------+-----------+------------+-----------+--------------+
| Num of Cols |     *     |     1      |    PK     | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2           | 144933.25 | 145292     | 146029.25 | 143973.5     |
| 4           | 146084    | 146633.5   | 146018.75 | 146581.25    |
| 8           | 143145.25 | 144393.25  | 145723.5  | 144790.25    |
| 16          | 145191.75 | 145174     | 144755.5  | 146666.75    |
| 32          | 144624    | 145483.75  | 143531    | 145366.25    |
| 64          | 145459.25 | 146175.75  | 147174.25 | 146622.5     |
| 128         | 145625.75 | 143823.25  | 144132    | 144739.25    |
| 256         | 145380.75 | 147224     | 146203.25 | 147078.75    |
| 512         | 146045    | 145609.25  | 145149.25 | 144335.5     |
| 1024        | 148280    | 148076     | 145593.25 | 146534.75    |
+-------------+-----------+------------+-----------+--------------+
| Total       | 1454769   | 1457884.75 | 1454310   | 1456688.75   |
+-------------+-----------+------------+-----------+--------------+

You can find the test script I used here.

Answered by Martin Smith

Solution #3

The best approach to find out is to run both versions through their paces and examine their execution plans. Choose a table with a large number of columns.

Answered by HLGEM

Solution #4

In SQL Server, there is no difference, and it has never been a problem. The optimizer recognizes that they are identical. If you look at the execution plans, you will see that they are identical.

Answered by Cade Roux

Solution #5

Personally, I find it difficult to believe that they don’t use the same query plan. However, the only way to know for sure in your own case is to put it to the test. Please let us know if you succeed!

Answered by Larry Lustig

Post is based on https://stackoverflow.com/questions/1597442/subquery-using-exists-1-or-exists