Coder Perfect

Which is better: COUNT(*) vs. COUNT(1) vs. COUNT(pk)? [duplicate]

Problem

I frequently come across the following three variations:

SELECT COUNT(*) FROM Foo;
SELECT COUNT(1) FROM Foo;
SELECT COUNT(PrimaryKey) FROM Foo;

They all accomplish the same thing, as far as I can tell, and I use all three in my codebase. I, on the other hand, dislike doing the same thing in multiple ways. Which one should I choose? Is one of them superior to the other two?

Asked by zneak

Solution #1

Use either COUNT(field) or COUNT(*) and stick to it, and if your database enables it, use COUNT(tableHere) or COUNT(tableHere.*).

In a nutshell, COUNT(1) should not be used for anything. It’s a one-trick pony that rarely does what you want and is comparable to count(*) in those rare circumstances.

Use * for any query that needs to count everything, including joins.

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

However, COUNT(*) should not be used for LEFT joins because it will return 1 even if the subordinate table does not match anything in the parent table.

SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Don’t be fooled by those advising that when using * in COUNT, it fetches entire row from your table, saying that * is slow. The * on SELECT COUNT(*) and SELECT * has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *.

In reality, if naming a field the same as its table name is not permissible, the RDBMS language designer may have COUNT(tableNameHere) have the same semantics as COUNT(*). Example:

This might be used to count rows:

SELECT COUNT(emp) FROM emp

They could also make it easier:

SELECT COUNT() FROM emp

For LEFT JOINS, we may have something like this:

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

However, they are unable to do so (COUNT(tableNameHere)) since the SQL standard allows a field to have the same name as its table:

CREATE TABLE fruit -- ORM-friendly name
(
fruit_id int NOT NULL,
fruit varchar(50), /* same name as table name, 
                and let's say, someone forgot to put NOT NULL */
shape varchar(50) NOT NULL,
color varchar(50) NOT NULL
)

It’s also not a good idea to create a field nullable if its name is the same as the table name. On the fruit field, say you have the values ‘Banana,’ ‘Apple,’ NULL, and ‘Pears.’ This will not count all of the rows; it will only return three, not four.

SELECT count(fruit) FROM fruit

Though some RDBMS follow this idea (for counting the table’s rows, it accepts the table name as the COUNT argument), this will work in Postgresql (assuming there are no subordinate fields in either of the two tables below, and there is no name conflict between field name and table name):

SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

But that could cause confusion later if we will add a subordinate field in the table, as it will count the field(which could be nullable), not the table rows.

To be on the safe side, utilize the following:

SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

COUNT(1), in particular, is a one-trick pony that only works on one table query:

SELECT COUNT(1) FROM tbl

However, if you utilize joins, that approach won’t work on multi-table queries without causing semantic confusion, and you won’t be able to writ

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

So, what does COUNT(1) mean in this context?

SELECT boss.boss_id, COUNT(1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Is it this…?

-- counting all the subordinates only
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Or this…?

-- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

COUNT(1) is the same as COUNT(*), regardless of the type of join, if you think about it. However, we can’t make COUNT(1) function as COUNT(subordinate.boss id), COUNT(subordinate.*) for LEFT JOINS.

Simply use one of the following:

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Working with Postgresql, it’s obvious that you want to count the set’s cardinality.

-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id

Another English-like way to count the set’s cardinality: (simply don’t name a column the same as the table name): http://www.sqlfiddle.com/#!1/98515/7

select boss.boss_name, count(subordinate)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

This is something you can’t do: http://www.sqlfiddle.com/#!1/98515/8

select boss.boss_name, count(subordinate.1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

You can do it, but the result will be incorrect: http://www.sqlfiddle.com/#!1/98515/9

select boss.boss_name, count(1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name

Answered by 14 revs, 3 users 89%

Solution #2

Two of them always come up with the same response:

Assuming that the pk is a primary key and that the values do not include any nulls,

If pk is not limited to be not null, however, it returns a different result:

In most cases, I use COUNT(*), which is the original SQL recommended syntax. Similarly, I usually put WHERE EXISTS(SELECT * FROM…) in the EXISTS clause because that was the original recommended notation. The alternatives should provide no benefit, and the optimizer should be able to see through the more cryptic notations.

Answered by Jonathan Leffler

Solution #3

Before, I was asked and answered…

“COUNT ( [[ALL | DISTINCT] expression] | *)” says a book on the internet.

Because “1” is a non-null expression, it is equivalent to COUNT(*). Because the optimiser sees it as trivial, it chooses the same plan. COUNT(PK) Equals COUNT(*) since a PK is unique and non-null (at least in SQL Server).

This is a myth that is identical to EXISTS (SELECT *… or EXISTS (SELECT 1…

Also also section 6.5, General Rules, case 1 of the ANSI 92 specification.

        a) If COUNT(*) is specified, then the result is the cardinality
          of T.

        b) Otherwise, let TX be the single-column table that is the
          result of applying the <value expression> to each row of T
          and eliminating null values. If one or more null values are
          eliminated, then a completion condition is raised: warning-
          null value eliminated in set function.

Answered by gbn

Solution #4

They’re all the same on Oracle, at least: http://www.oracledba.co.uk/tips/count speed.htm

Answered by ZeissS

Solution #5

I believe that the performance characteristics of each DBMS differ. It all depends on how they choose to put it into action. Since I have worked extensively on Oracle, I’ll tell from that perspective.

COUNT(*) – Gets the complete row into the result set before sending it on to the count function; if the row isn’t in the result set, the count function will aggregate 1.

COUNT(1) – Rather than fetching any rows, count is called with a constant value of 1 for each row in the table when the WHERE condition is met.

COUNT(PK) – In Oracle, the PK is indexed. This implies Oracle will simply have to read the index. In the index B+ tree, one row is typically many times smaller than the actual row. In terms of disk IOPS, Oracle can fetch several times more rows from Index with a single block transfer than it can with a full row transfer. As a result, the query’s throughput improves.

As you can see, with Oracle, the first count is the slowest and the last count is the fastest.

Answered by arunmur

Post is based on https://stackoverflow.com/questions/2710621/count-vs-count1-vs-countpk-which-is-better