Coder Perfect

How can you do a Postgresql subquery with a join in the from clause like SQL Server?

Problem

On Postgresql, I’m attempting to write the following query:

select name, author_id, count(1), 
    (select count(1)
    from names as n2
    where n2.id = n1.id
        and t2.author_id = t1.author_id
    )               
from names as n1
group by name, author_id

This would work fine on Microsoft SQL Server, but not so much on Postegresql. I looked through the docs and it appears that I could rewrite it as

select name, author_id, count(1), total                     
from names as n1, (select count(1) as total
    from names as n2
    where n2.id = n1.id
        and n2.author_id = t1.author_id
    ) as total
group by name, author_id

On postegresql, however, this results in the following error: “subquery in FROM cannot refer to other relations of same query level.” As a result, I’m stuck. Is there any way for me to do that?

Thanks

Asked by Ricardo

Solution #1

I’m not sure I grasp your purpose completely, however the following might be close:

select n1.name, n1.author_id, count_1, total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select id, author_id, count(1) as total_count
              from names
              group by id, author_id) n2
  on (n2.id = n1.id and n2.author_id = n1.author_id)

Unfortunately, this requires grouping the first subquery by id, name, and author id, which I don’t believe was intended. I’m not sure how to get around it because you need id to join the second subquery. Someone else might be able to come up with a better solution.

Share and enjoy.

Answered by Bob Jarvis – Reinstate Monica

Solution #2

In addition to @Bob Jarvis and @dmikam’s responses, Postgres does not perform well when you don’t use LATERAL. In the simulation below, the query data returns are the same in both circumstances, but the costs are substantially different.

Table structure

CREATE TABLE ITEMS (
    N INTEGER NOT NULL,
    S TEXT NOT NULL
);

INSERT INTO ITEMS
  SELECT
    (random()*1000000)::integer AS n,
    md5(random()::text) AS s
  FROM
    generate_series(1,1000000);

CREATE INDEX N_INDEX ON ITEMS(N);

Using JOIN with GROUP BY in a subquery without the use of LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    GROUP BY N
) I2 ON I2.N = I.N
WHERE I.N IN (243477, 997947);

The results

Merge Join  (cost=0.87..637500.40 rows=23 width=37)
  Merge Cond: (i.n = items.n)
  ->  Index Scan using n_index on items i  (cost=0.43..101.28 rows=23 width=37)
        Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..626631.11 rows=861418 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..593016.93 rows=10000000 width=4)

Using LATERAL

EXPLAIN 
SELECT 
    I.*
FROM ITEMS I
INNER JOIN LATERAL (
    SELECT 
        COUNT(1), n
    FROM ITEMS
    WHERE N = I.N
    GROUP BY N
) I2 ON 1=1 --I2.N = I.N
WHERE I.N IN (243477, 997947);

Results

Nested Loop  (cost=9.49..1319.97 rows=276 width=37)
  ->  Bitmap Heap Scan on items i  (cost=9.06..100.20 rows=23 width=37)
        Recheck Cond: (n = ANY ('{243477,997947}'::integer[]))
        ->  Bitmap Index Scan on n_index  (cost=0.00..9.05 rows=23 width=0)
              Index Cond: (n = ANY ('{243477,997947}'::integer[]))
  ->  GroupAggregate  (cost=0.43..52.79 rows=12 width=12)
        Group Key: items.n
        ->  Index Only Scan using n_index on items  (cost=0.43..52.64 rows=12 width=4)
              Index Cond: (n = i.n)

PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) is the version I’m using.

Answered by deFreitas

Solution #3

I know it’s old, but starting with Postgresql 9.3, you can use the keyword “LATERAL” to use RELATED subqueries inside of JOINS, thus the query from the question would be:

SELECT 
    name, author_id, count(*), t.total
FROM
    names as n1
    INNER JOIN LATERAL (
        SELECT 
            count(*) as total
        FROM 
            names as n2
        WHERE 
            n2.id = n1.id
            AND n2.author_id = n1.author_id
    ) as t ON 1=1
GROUP BY 
    n1.name, n1.author_id

Answered by dmikam

Solution #4

I’m simply responding here with the formatted version of the final sql I required, based on Bob Jarvis’ response, which I posted in my previous comment:

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select author_id, count(1) as total_count
              from names
              group by author_id) n2
  on (n2.author_id = n1.author_id)

Answered by Ricardo

Solution #5

select n1.name, n1.author_id, cast(count_1 as numeric)/total_count
  from (select id, name, author_id, count(1) as count_1
          from names
          group by id, name, author_id) n1
inner join (select distinct(author_id), count(1) as total_count
              from names) n2
  on (n2.author_id = n1.author_id)
Where true

If there are more inner joins, utilize distinct because the performance of more join groups is slow.

Answered by Zahid Gani

Post is based on https://stackoverflow.com/questions/3004887/how-to-do-a-postgresql-subquery-in-select-clause-with-join-in-from-clause-like-s