Coder Perfect

In PostgreSQL, how do you combine RETURNING and ON CONFLICT?

Problem

In PostgreSQL 9.5, I have the following UPSERT:

INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
ON CONFLICT("user", "contact") DO NOTHING
RETURNING id;

It yields something like this if there are no conflicts:

----------
    | id |
----------
  1 | 50 |
----------
  2 | 51 |
----------

However, if there are any conflicts, no rows are returned:

----------
    | id |
----------

If there are no conflicts, I want to return the new id columns; otherwise, I want to return the current id columns of the conflicting columns. Is it possible? If so, how would you go about doing it?

Asked by zola

Solution #1

For a single conflict target, a few conflicts, short tuples, and no triggers, the currently accepted response appears to be adequate. It uses brute force to get around concurrency issue 1 (see below). The simplicity of the approach has its charm, and the negative consequences may be insignificant.

In all other circumstances, however, do not update similar rows unless absolutely necessary. Even though there is no difference on the surface, there are a number of negative consequences:

Furthermore, ON CONFLICT DO UPDATE is not always practicable or even viable. The instruction manual is as follows:

When many indices / constraints are involved, a single “conflict target” is not viable. However, there is a solution for many partial indexes that is related:

Returning to the subject, you may achieve (nearly) the same result without the need for empty updates or side effects. Some of the following solutions can also be used in conjunction with ON CONFLICT DO NOTHING (no “conflict target”) to catch all potential conflicts that may or may not be desirable.

WITH input_rows(usr, contact, name) AS (
   VALUES
      (text 'foo1', text 'bar1', text 'bob1')  -- type casts in first row
    , ('foo2', 'bar2', 'bob2')
    -- more?
   )
, ins AS (
   INSERT INTO chats (usr, contact, name) 
   SELECT * FROM input_rows
   ON CONFLICT (usr, contact) DO NOTHING
   RETURNING id  --, usr, contact              -- return more columns?
   )
SELECT 'i' AS source                           -- 'i' for 'inserted'
     , id  --, usr, contact                    -- return more columns?
FROM   ins
UNION  ALL
SELECT 's' AS source                           -- 's' for 'selected'
     , c.id  --, usr, contact                  -- return more columns?
FROM   input_rows
JOIN   chats c USING (usr, contact);           -- columns of unique index

The source column is included as an example of how this works. It’s possible that you’ll need it to distinguish between the two scenarios (another advantage over empty writes).

Because newly added entries from a connected data-modifying CTE are not yet visible in the underlying table, the final JOIN works. (Snapshots of underlying tables are seen by all sections of the same SQL expression.)

Postgres cannot deduce data types from the target columns since the VALUES expression is free-standing (not immediately related to an INSERT), therefore you may need to use explicit type casts. The instruction manual is as follows:

Due to the overhead of the CTE and the additional SELECT, the query itself (excluding side effects) may be a little more expensive for a few dupes (which should be cheap since the perfect index is there by definition – a unique constraint is implemented with an index).

For many duplicates, it may be (much) faster. The true cost of additional writings is determined by a variety of factors.

In any event, there are fewer side effects and hidden expenses. It’s almost certainly less expensive in the long run.

Attached sequences are still advanced, since default values are filled in before testing for conflicts.

About CTEs:

Taking the default position Isolation of READ COMMITTED transactions. Related:

The optimum technique for avoiding race circumstances is determined by the specific needs, the number and size of rows in the table and UPSERTs, the number of concurrent transactions, the risk of conflicts, available resources, and other considerations…

If a concurrent transaction has written to a row that your transaction now tries to UPSERT, it must wait for the other process to complete.

Your transaction can continue properly if the other transaction finishes with ROLLBACK (or any fault, i.e. automatic ROLLBACK). Gaps in successive numerals are a minor probable adverse effect. However, there are no missing rows.

If the other transaction completes normally (implicit or explicit COMMIT), your INSERT will identify a conflict (since the UNIQUE index / constraint is absolute) and DO NOTHING, resulting in the row not being returned. (Also, because the row isn’t visible, it can’t be locked, as seen in concurrency issue 2 below.) The SELECT sees the same snapshot as the start of the query and is unable to return the row that is now hidden.

Even though such entries exist in the underlying table, they are not included in the result set!

It’s possible that this is fine as is. Especially if you aren’t returning rows like in the example and are OK with the fact that the row exists. If it isn’t enough, there are a number of other options.

If the row count of the output does not equal the row count of the input, you can repeat the statement. For the uncommon circumstance, it might suffice. The goal is to launch a new query (which might be within the same transaction) that will see the newly committed rows.

Alternatively, look for missing result rows within the same query and use Alextoni’s brute force method to overwrite them.

WITH input_rows(usr, contact, name) AS ( ... )  -- see above
, ins AS (
   INSERT INTO chats AS c (usr, contact, name) 
   SELECT * FROM input_rows
   ON     CONFLICT (usr, contact) DO NOTHING
   RETURNING id, usr, contact                   -- we need unique columns for later join
   )
, sel AS (
   SELECT 'i'::"char" AS source                 -- 'i' for 'inserted'
        , id, usr, contact
   FROM   ins
   UNION  ALL
   SELECT 's'::"char" AS source                 -- 's' for 'selected'
        , c.id, usr, contact
   FROM   input_rows
   JOIN   chats c USING (usr, contact)
   )
, ups AS (                                      -- RARE corner case
   INSERT INTO chats AS c (usr, contact, name)  -- another UPSERT, not just UPDATE
   SELECT i.*
   FROM   input_rows i
   LEFT   JOIN sel   s USING (usr, contact)     -- columns of unique index
   WHERE  s.usr IS NULL                         -- missing!
   ON     CONFLICT (usr, contact) DO UPDATE     -- we've asked nicely the 1st time ...
   SET    name = c.name                         -- ... this time we overwrite with old value
   -- SET name = EXCLUDED.name                  -- alternatively overwrite with *new* value
   RETURNING 'u'::"char" AS source              -- 'u' for updated
           , id  --, usr, contact               -- return more columns?
   )
SELECT source, id FROM sel
UNION  ALL
TABLE  ups;

It’s the same query as before, but we add a CTE ups step before returning the entire result set. Most of the time, that last CTE will accomplish nothing. We only employ brute force if rows are missing from the returned result.

There’s still more overhead. The more pre-existing rows with conflicts, the more probable this strategy will outperform the straightforward approach.

One unintended consequence is that the second UPSERT writes rows out of order, reintroducing the risk of deadlocks (see below) if three or more transactions writing to the same rows overlap. If that’s the case, you’ll need to come up with a new solution, such as repeating the entire phrase as indicated above.

If concurrent transactions can write to impacted rows’ involved columns and you need to make sure the rows you found are still there later in the transaction, you can lock existing rows in the CTE ins (which would otherwise be unlocked) with:

...
ON CONFLICT (usr, contact) DO UPDATE
SET name = name WHERE FALSE  -- never executed, but still locks the row
...

Also include a locking clause in the SELECT, such as FOR UPDATE.

This forces competing write operations to wait until the transaction’s completion, when all locks are freed, before proceeding. So keep it short.

Additional information and explanation:

By putting rows in a consistent order, you can avoid deadlocks. See:

It may be unpleasant to use explicit type casts for the initial row of data in the free-standing VALUES statement. There are workarounds available. As a row template, you can utilize any existing relation (table, view, etc.). For this use scenario, the target table is a logical choice. Input data is automatically coerced to appropriate types, like in the VALUES clause of an INSERT:

WITH input_rows AS (
  (SELECT usr, contact, name FROM chats LIMIT 0)  -- only copies column names and types
   UNION ALL
   VALUES
      ('foo1', 'bar1', 'bob1')  -- no type casts here
    , ('foo2', 'bar2', 'bob2')
   )
   ...

For some data types, this does not function. See:

This applies to all data kinds as well.

You can ignore column names when adding into all (leading) columns of the table. Assuming that the table talks in this example only include the three columns utilized in the UPSERT:

WITH input_rows AS (
   SELECT * FROM (
      VALUES
      ((NULL::chats).*)         -- copies whole row definition
      ('foo1', 'bar1', 'bob1')  -- no type casts needed
    , ('foo2', 'bar2', 'bob2')
      ) sub
   OFFSET 1
   )
   ...

Aside: don’t use reserved words as identifiers, such as “user.” This is a fully loaded footgun. Use legal identifiers that are lowercase and unquoted. I changed it to usr.

Answered by Erwin Brandstetter

Solution #2

I experienced the same issue and corrected it by using ‘do update’ instead of ‘do nothing,’ despite the fact that there was nothing to update. It would look something like this in your case:

INSERT INTO chats ("user", "contact", "name") 
       VALUES ($1, $2, $3), 
              ($2, $1, NULL) 
ON CONFLICT("user", "contact") 
DO UPDATE SET 
    name=EXCLUDED.name 
RETURNING id;

This query will return all rows, whether they were freshly inserted or had previously existed.

Answered by Alextoni

Solution #3

WITH e AS(
    INSERT INTO chats ("user", "contact", "name") 
           VALUES ($1, $2, $3), 
                  ($2, $1, NULL) 
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
)
SELECT * FROM e
UNION
    SELECT id FROM chats WHERE user=$1, contact=$2;

The biggest benefit of using ON CONFLICT DO NOTHING is that it prevents errors from being thrown, but it also prevents row returns. To get the existing id, we’ll need another SELECT.

If the SQL fails due to conflicts, it will return nothing, and the second SELECT will return the existing row; if the SQL succeeds, there will be two identical records, and we will need to use UNION to merge the results.

Answered by Yu Huang

Solution #4

In the event of a constraint conflict, Upsert, as an extension of the INSERT query, can be defined with two distinct behaviors: DO NOTHING or DO UPDATE.

INSERT INTO upsert_table VALUES (2, 6, 'upserted')
   ON CONFLICT DO NOTHING RETURNING *;

 id | sub_id | status
----+--------+--------
 (0 rows)

Also, because no tuples have been inserted, RETURNING returns nothing. It is now able to do operations on the tuple with which there is a conflict using DO UPDATE. To begin, create a constraint that will be utilized to determine whether or not there is a conflict.

INSERT INTO upsert_table VALUES (2, 2, 'inserted')
   ON CONFLICT ON CONSTRAINT upsert_table_sub_id_key
   DO UPDATE SET status = 'upserted' RETURNING *;

 id | sub_id |  status
----+--------+----------
  2 |      2 | upserted
(1 row)

Answered by Jaumzera

Solution #5

When returning the id: for insertions of a single item, I’d probably use a coalesce.

WITH new_chats AS (
    INSERT INTO chats ("user", "contact", "name")
    VALUES ($1, $2, $3)
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
) SELECT COALESCE(
    (SELECT id FROM new_chats),
    (SELECT id FROM chats WHERE user = $1 AND contact = $2)
);

You can place the values on a temporary WITH and reference them later for insertions of numerous items:

WITH chats_values("user", "contact", "name") AS (
    VALUES ($1, $2, $3),
           ($4, $5, $6)
), new_chats AS (
    INSERT INTO chats ("user", "contact", "name")
    SELECT * FROM chat_values
    ON CONFLICT("user", "contact") DO NOTHING
    RETURNING id
) SELECT id
    FROM new_chats
   UNION
  SELECT chats.id
    FROM chats, chats_values
   WHERE chats.user = chats_values.user
     AND chats.contact = chats_values.contact

Answered by João Haas

Post is based on https://stackoverflow.com/questions/34708509/how-to-use-returning-with-on-conflict-in-postgresql