Problem
On Postgresql, I’m attempting to write the following query:
select name, author_id, count(1),
(select count(1)
from names as n2
where n2.id = n1.id
and t2.author_id = t1.author_id
)
from names as n1
group by name, author_id
This would work fine on Microsoft SQL Server, but not so much on Postegresql. I looked through the docs and it appears that I could rewrite it as
select name, author_id, count(1), total
from names as n1, (select count(1) as total
from names as n2
where n2.id = n1.id
and n2.author_id = t1.author_id
) as total
group by name, author_id
On postegresql, however, this results in the following error: “subquery in FROM cannot refer to other relations of same query level.” As a result, I’m stuck. Is there any way for me to do that?
Thanks
Asked by Ricardo
Solution #1
I’m not sure I grasp your purpose completely, however the following might be close:
select n1.name, n1.author_id, count_1, total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
from names
group by id, author_id) n2
on (n2.id = n1.id and n2.author_id = n1.author_id)
Unfortunately, this requires grouping the first subquery by id, name, and author id, which I don’t believe was intended. I’m not sure how to get around it because you need id to join the second subquery. Someone else might be able to come up with a better solution.
Share and enjoy.
Answered by Bob Jarvis – Reinstate Monica
Solution #2
In addition to @Bob Jarvis and @dmikam’s responses, Postgres does not perform well when you don’t use LATERAL. In the simulation below, the query data returns are the same in both circumstances, but the costs are substantially different.
Table structure
CREATE TABLE ITEMS (
N INTEGER NOT NULL,
S TEXT NOT NULL
);
INSERT INTO ITEMS
SELECT
(random()*1000000)::integer AS n,
md5(random()::text) AS s
FROM
generate_series(1,1000000);
CREATE INDEX N_INDEX ON ITEMS(N);
Using JOIN with GROUP BY in a subquery without the use of LATERAL
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN (
SELECT
COUNT(1), n
FROM ITEMS
GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);
The results
Merge Join (cost=0.87..637500.40 rows=23 width=37)
Merge Cond: (i.n = items.n)
-> Index Scan using n_index on items i (cost=0.43..101.28 rows=23 width=37)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..626631.11 rows=861418 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..593016.93 rows=10000000 width=4)
Using LATERAL
EXPLAIN
SELECT
I.*
FROM ITEMS I
INNER JOIN LATERAL (
SELECT
COUNT(1), n
FROM ITEMS
WHERE N = I.N
GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);
Results
Nested Loop (cost=9.49..1319.97 rows=276 width=37)
-> Bitmap Heap Scan on items i (cost=9.06..100.20 rows=23 width=37)
Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
-> Bitmap Index Scan on n_index (cost=0.00..9.05 rows=23 width=0)
Index Cond: (n = ANY ('{243477,997947}'::integer[]))
-> GroupAggregate (cost=0.43..52.79 rows=12 width=12)
Group Key: items.n
-> Index Only Scan using n_index on items (cost=0.43..52.64 rows=12 width=4)
Index Cond: (n = i.n)
PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) is the version I’m using.
Answered by deFreitas
Solution #3
I know it’s old, but starting with Postgresql 9.3, you can use the keyword “LATERAL” to use RELATED subqueries inside of JOINS, thus the query from the question would be:
SELECT
name, author_id, count(*), t.total
FROM
names as n1
INNER JOIN LATERAL (
SELECT
count(*) as total
FROM
names as n2
WHERE
n2.id = n1.id
AND n2.author_id = n1.author_id
) as t ON 1=1
GROUP BY
n1.name, n1.author_id
Answered by dmikam
Solution #4
I’m simply responding here with the formatted version of the final sql I required, based on Bob Jarvis’ response, which I posted in my previous comment:
select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
from names
group by author_id) n2
on (n2.author_id = n1.author_id)
Answered by Ricardo
Solution #5
select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
from (select id, name, author_id, count(1) as count_1
from names
group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
from names) n2
on (n2.author_id = n1.author_id)
Where true
If there are more inner joins, utilize distinct because the performance of more join groups is slow.
Answered by Zahid Gani
Post is based on https://stackoverflow.com/questions/3004887/how-to-do-a-postgresql-subquery-in-select-clause-with-join-in-from-clause-like-s