Problem
What makes an INNER JOIN different from an LEFT SEMI JOIN?
Why am I receiving two different results in the example below?
The result set of an INNER JOIN is much larger. Could someone please explain? I’m looking for names in table 1 that are only found in table 2.
SELECT name
FROM table_1 a
INNER JOIN table_2 b ON a.name=b.name
SELECT name
FROM table_1 a
LEFT SEMI JOIN table_2 b ON (a.name=b.name)
Asked by user3023355
Solution #1
An INNER JOIN can return data from both tables’ columns, as well as duplicate values of records on either side with multiple matches. A LEFT SEMI JOIN can only return columns from the left-hand table, and it returns one of each record from the left-hand table where the right-hand table has one or more matches (regardless of the number of matches). It’s the same as (in SQL standard):
SELECT name
FROM table_1 a
WHERE EXISTS(
SELECT * FROM table_2 b WHERE (a.name=b.name))
If the right-hand column contains several matching rows, an INNER JOIN will return one row for each match on the right table, whereas an LEFT SEMI JOIN will only return data from the left table, regardless of the number of matching rows on the right side. That’s why your result has a different number of rows than expected.
A LEFT SEMI JOIN is the query to utilize in this case.
Answered by D Stanley
Solution #2
Assume there are two tables, TableA and TableB, each with only two columns (Id and Data) containing the following information:
TableA:
+----+---------+
| Id | Data |
+----+---------+
| 1 | DataA11 |
| 1 | DataA12 |
| 1 | DataA13 |
| 2 | DataA21 |
| 3 | DataA31 |
+----+---------+
TableB:
+----+---------+
| Id | Data |
+----+---------+
| 1 | DataB11 |
| 2 | DataB21 |
| 2 | DataB22 |
| 2 | DataB23 |
| 4 | DataB41 |
+----+---------+
a column with an inner join Id will return columns from both tables, but only the records that match:
.----.---------.----.---------.
| Id | Data | Id | Data |
:----+---------+----+---------:
| 1 | DataA11 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA12 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA13 | 1 | DataB11 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB21 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB22 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB23 |
'----'---------'----'---------'
On column Id, a left join (or left outer join) will return columns from both tables as well as matching records with records from the left table (null values from the right table):
.----.---------.----.---------.
| Id | Data | Id | Data |
:----+---------+----+---------:
| 1 | DataA11 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA12 | 1 | DataB11 |
:----+---------+----+---------:
| 1 | DataA13 | 1 | DataB11 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB21 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB22 |
:----+---------+----+---------:
| 2 | DataA21 | 2 | DataB23 |
:----+---------+----+---------:
| 3 | DataA31 | | |
'----'---------'----'---------'
Right Join (or Right Outer Join) on column Id returns columns from both tables as well as matching records from the right table (null values from the left table):
┌────┬─────────┬────┬─────────┐
│ Id │ Data │ Id │ Data │
├────┼─────────┼────┼─────────┤
│ 1 │ DataA11 │ 1 │ DataB11 │
│ 1 │ DataA12 │ 1 │ DataB11 │
│ 1 │ DataA13 │ 1 │ DataB11 │
│ 2 │ DataA21 │ 2 │ DataB21 │
│ 2 │ DataA21 │ 2 │ DataB22 │
│ 2 │ DataA21 │ 2 │ DataB23 │
│ │ │ 4 │ DataB41 │
└────┴─────────┴────┴─────────┘
On the column, there is a full outside join. Id will return columns from both tables, as well as matching records from the left table (Null values from the right table) and the right table (Null values from the left table):
╔════╦═════════╦════╦═════════╗
║ Id ║ Data ║ Id ║ Data ║
╠════╬═════════╬════╬═════════╣
║ - ║ ║ ║ ║
║ 1 ║ DataA11 ║ 1 ║ DataB11 ║
║ 1 ║ DataA12 ║ 1 ║ DataB11 ║
║ 1 ║ DataA13 ║ 1 ║ DataB11 ║
║ 2 ║ DataA21 ║ 2 ║ DataB21 ║
║ 2 ║ DataA21 ║ 2 ║ DataB22 ║
║ 2 ║ DataA21 ║ 2 ║ DataB23 ║
║ 3 ║ DataA31 ║ ║ ║
║ ║ ║ 4 ║ DataB41 ║
╚════╩═════════╩════╩═════════╝
Semi-Right Join on a column basis Id will only return columns from the left table, as well as matching records from the left table:
┌────┬─────────┐
│ Id │ Data │
├────┼─────────┤
│ 1 │ DataA11 │
│ 1 │ DataA12 │
│ 1 │ DataA13 │
│ 2 │ DataA21 │
└────┴─────────┘
Answered by Abhishek Bansal
Solution #3
I tried it in Hive and got the following results.
table1
table2
Inner Join
Left Join
Left Semi Join
Answered by Kumar
Solution #4
All of the answers shown above are correct. However, while envisioning LEFT SEMI JOIN in practice, it helps to associate the mental model of a filter.
The answer is a subset of LEFT TABLE rows that match rows in RIGHT TABLE.
Answered by dsculptor
Post is based on https://stackoverflow.com/questions/21738784/difference-between-inner-join-and-left-semi-join