Coder Perfect

INNER JOIN and LEFT SEMI JOIN are two different types of joins.

Problem

What makes an INNER JOIN different from an LEFT SEMI JOIN?

Why am I receiving two different results in the example below?

The result set of an INNER JOIN is much larger. Could someone please explain? I’m looking for names in table 1 that are only found in table 2.

SELECT name
FROM table_1 a
    INNER JOIN table_2 b ON a.name=b.name

SELECT name
FROM table_1 a
    LEFT SEMI JOIN table_2 b ON (a.name=b.name)

Asked by user3023355

Solution #1

An INNER JOIN can return data from both tables’ columns, as well as duplicate values of records on either side with multiple matches. A LEFT SEMI JOIN can only return columns from the left-hand table, and it returns one of each record from the left-hand table where the right-hand table has one or more matches (regardless of the number of matches). It’s the same as (in SQL standard):

SELECT name
FROM table_1 a
WHERE EXISTS(
    SELECT * FROM table_2 b WHERE (a.name=b.name))

If the right-hand column contains several matching rows, an INNER JOIN will return one row for each match on the right table, whereas an LEFT SEMI JOIN will only return data from the left table, regardless of the number of matching rows on the right side. That’s why your result has a different number of rows than expected.

A LEFT SEMI JOIN is the query to utilize in this case.

Answered by D Stanley

Solution #2

Assume there are two tables, TableA and TableB, each with only two columns (Id and Data) containing the following information:

TableA:

+----+---------+
| Id |  Data   |
+----+---------+
|  1 | DataA11 |
|  1 | DataA12 |
|  1 | DataA13 |
|  2 | DataA21 |
|  3 | DataA31 |
+----+---------+

TableB:

+----+---------+
| Id |  Data   |
+----+---------+
|  1 | DataB11 |
|  2 | DataB21 |
|  2 | DataB22 |
|  2 | DataB23 |
|  4 | DataB41 |
+----+---------+

a column with an inner join Id will return columns from both tables, but only the records that match:

.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
'----'---------'----'---------'

On column Id, a left join (or left outer join) will return columns from both tables as well as matching records with records from the left table (null values from the right table):

.----.---------.----.---------.
| Id |  Data   | Id |  Data   |
:----+---------+----+---------:
|  1 | DataA11 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA12 |  1 | DataB11 |
:----+---------+----+---------:
|  1 | DataA13 |  1 | DataB11 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB21 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB22 |
:----+---------+----+---------:
|  2 | DataA21 |  2 | DataB23 |
:----+---------+----+---------:
|  3 | DataA31 |    |         |
'----'---------'----'---------'

Right Join (or Right Outer Join) on column Id returns columns from both tables as well as matching records from the right table (null values from the left table):

┌────┬─────────┬────┬─────────┐
│ Id │  Data   │ Id │  Data   │
├────┼─────────┼────┼─────────┤
│  1 │ DataA11 │  1 │ DataB11 │
│  1 │ DataA12 │  1 │ DataB11 │
│  1 │ DataA13 │  1 │ DataB11 │
│  2 │ DataA21 │  2 │ DataB21 │
│  2 │ DataA21 │  2 │ DataB22 │
│  2 │ DataA21 │  2 │ DataB23 │
│    │         │  4 │ DataB41 │
└────┴─────────┴────┴─────────┘

On the column, there is a full outside join. Id will return columns from both tables, as well as matching records from the left table (Null values from the right table) and the right table (Null values from the left table):

╔════╦═════════╦════╦═════════╗
║ Id ║  Data   ║ Id ║  Data   ║
╠════╬═════════╬════╬═════════╣
║  - ║         ║    ║         ║
║  1 ║ DataA11 ║  1 ║ DataB11 ║
║  1 ║ DataA12 ║  1 ║ DataB11 ║
║  1 ║ DataA13 ║  1 ║ DataB11 ║
║  2 ║ DataA21 ║  2 ║ DataB21 ║
║  2 ║ DataA21 ║  2 ║ DataB22 ║
║  2 ║ DataA21 ║  2 ║ DataB23 ║
║  3 ║ DataA31 ║    ║         ║
║    ║         ║  4 ║ DataB41 ║
╚════╩═════════╩════╩═════════╝

Semi-Right Join on a column basis Id will only return columns from the left table, as well as matching records from the left table:

┌────┬─────────┐
│ Id │  Data   │
├────┼─────────┤
│  1 │ DataA11 │
│  1 │ DataA12 │
│  1 │ DataA13 │
│  2 │ DataA21 │
└────┴─────────┘

Answered by Abhishek Bansal

Solution #3

I tried it in Hive and got the following results.

table1

table2

Inner Join

Left Join

Left Semi Join

Answered by Kumar

Solution #4

All of the answers shown above are correct. However, while envisioning LEFT SEMI JOIN in practice, it helps to associate the mental model of a filter.

The answer is a subset of LEFT TABLE rows that match rows in RIGHT TABLE.

Answered by dsculptor

Post is based on https://stackoverflow.com/questions/21738784/difference-between-inner-join-and-left-semi-join