facebook google plus twitter
Webucator's Free Advanced Oracle SQL Queries Tutorial

Lesson: Joining Tables

Welcome to our free Advanced Oracle SQL Queries tutorial. This tutorial is based on Webucator's Advanced Oracle SQL Queries course.

Oracle SQL queries are not limited to selecting data from a single table. Several tables can be accessed within a query and the data aggregated into a single result set which is returned to the user. The relationship between two tables is known as a join and consists of a mapping of values between specified columns in each table. In this chapter you will learn about various types of joins and their application to data retrieval.

Lesson Goals

  • Review Joins.
  • Learn about equijoins.
  • Learn about reflexive joins.
  • Learn about non-key joins.
  • Learn about Outer join About Ansi/Iso SQL99.
  • Learn about cross joins.
  • Learn about natural joins.
  • Learn about outer joins.
  • Learn about semijoins and antijoins.
  • Learn about Using Named subqueries.

Review of Joins

A join is the means of relating the data in two tables together.

An equijoin has a join condition containing an equality operator. If the joined columns have the same name, the join can be referred to as a natural join. The majority of the time, joins are made between two different tables with a foreign key to primary key relationship. A single table can even be joined to itself if the two keys are in that table (a reflexive join). However, it is possible to join a table using columns other than primary or foreign keys (a non-key join). It is even possible to join tables without using the equality operator (e.g., an anti-join or semi-join) or to join with the results of a query rather than a table (named subqueries).

Equijoins

A number of different types of joins use the equals operator. The most common is the inner join. Oracle has an "old" syntax for joins but also supports ANSI standard SQL joins. There are actually some additional distinctions for different types of equijoins. Oracle's "old" syntax for the inner join conforms to earlier ANSI join syntax. Oracle's "old" syntax for the outer join is proprietary. It is based on the earlier ANSI inner join syntax. The earlier ANSI inner join syntax is often referred to as a "Theta Join". The newer ANSI join syntax is referred to as a "SQL99 Join".

Inner Joins

With an inner join, columns with a matching row in each joined table are returned. An inner join is the default type of join, so the INNER keyword may be omitted in the SQL99 version.

For example, list country and region information.

Code Sample:

Joining-Tables/Demos/inner_join.sql
-- Earlier ANSI Join Syntax (Theta Join)

SELECT * 
FROM regions r, countries c 
WHERE r.region_id = c.region_id;

-- New (ANSI SQL99) Syntax - the INNER keyword is optional

SELECT * 
FROM regions r 
INNER JOIN countries c 
ON r.region_id = c.region_id;

An inner join causes columns that have a matching row in each joined table to be returned. Traditionally this type of query was specified in Oracle as part of the WHERE clause using the earlier ANSI standard (Theta Join) syntax. SQL99 ANSI standard join syntax was added in Oracle 9i. An inner join can now be specified in a separate JOIN clause rather than including it among filtering predicates in the WHERE clause.

Outer Joins

Code Sample:

Joining-Tables/Demos/count_employees.sql
--
-- This query returns a count of 107 records in the employees table
--
SELECT count(*) 
FROM employees;
 
--   
--When we make an inner join to departments, we only get 106!
--
SELECT count(*) 
FROM employees e 
INNER JOIN departments d ON e.department_id = d.department_id;

--
-- What happened to the one record?  
-- As it turns out, one record has a null department_id.
--
SELECT * 
FROM employees 
WHERE department_id IS NULL;

Start by checking how many records are in the employees table. Next, create an inner join with the department table and check the count again. There appears to be a record missing. The missing record has a null in the department_id field, and therefore no corresponding row in the departments table. By using a left outer join, this record is returned as well.

In order to restrict the number of rows returned in the following queries, filtering predicate expressions (criteria) are added to each WHERE clause. However, keep in mind that an outer query returns all of the records returned by an inner query PLUS certain additional records from one or both of the tables.

Code Sample:

Joining-Tables/Demos/outer_join.sql
--
-- The employee can be retrieved when the department_id is ignored...
--
SELECT first_name, last_name 
FROM employees e    
WHERE e.employee_id = 178;

--   
-- ...but the employee is not returned when a join is made to the 
-- department table because there is no matching record in the
-- department table
--
SELECT first_name, last_name 
FROM employees e    
INNER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178;

--   
-- By using a LEFT OUTER JOIN we include all records that are included
-- in an inner join plus all records in the employees table.
--    
SELECT first_name, last_name 
FROM employees e    
LEFT OUTER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178;

--    
-- In order to demonstrate the right and full outer joins, it is necessary
-- to add a record to the departments table that does not have any 
-- employee records associated with it.
--    
INSERT INTO departments (
	department_id, department_name, manager_id, location_id
) 
VALUES 
(
	999,'Vacant Dept', NULL, NULL
);
    
--
--  This query returns no records
--
SELECT first_name, last_name, department_name
FROM employees e 
INNER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178
OR d.department_id = 999; 

--
-- Using a LEFT OUTER JOIN records are returned from the employee 
-- table with no associated record in the department table
--
SELECT first_name, last_name, department_name
FROM employees e 
LEFT OUTER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178
OR d.department_id = 999; 

--
-- Using a RIGHT OUTER JOIN records are returned from the deparment 
-- table with no associated record in the employees table
--
SELECT first_name, last_name, department_name
FROM employees e 
RIGHT OUTER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178
OR d.department_id = 999;    
    
--
-- This query returns records in the deparment table with no associated
-- record in the employees table as well as records in the employees
-- table with no record in the departments table
--
SELECT first_name, last_name, department_name
FROM employees e 
FULL OUTER JOIN departments d ON d.department_id = e.department_id
WHERE e.employee_id = 178
OR d.department_id = 999;
	
--
-- Using Oracle proprietary syntax, a left outer join can also be specified using (+) to indicate
-- the table column where a null is acceptable
--

SELECT first_name, last_name, department_name
FROM employees e , departments d
WHERE (e.employee_id = 178
OR d.department_id = 999)
AND d.department_id = e.department_id(+)

A generally accepted "Best Practice" is to avoid nulls in the database. However, real world situations result in occasions where parent records have no associated child records. An employee may not have an assigned department (if they have retired), and a department may not have an associated employee (if the department is no longer in use but still has relevant data associated with it).

These situations require the use of outer joins as illustrated in these examples.

Cross Joins

Code Sample:

Joining-Tables/Demos/cross_join.sql
--
--Theta Join syntax
--
SELECT first_name, last_name, department_name
FROM employees, departments
ORDER BY first_name, last_name, department_name;

A cross join (or Cartesian Product) will return a result set where each row from the first table is combined with each row from the second table. This is not often required in practice and is used intentionally rather sparingly.

Reflexive Joins

A reflexive join (or self-join), is a relationship in which a table is joined to itself. The employees table has both employees and employee managers in it. A self join can be made using this table and using the manager_id and the employee_id in the join.

Code Sample:

Joining-Tables/Demos/self_join.sql
SELECT first_name, last_name 
FROM employees 
WHERE employee_id=103;

SELECT first_name, last_name 
FROM employees 
WHERE manager_id=103;

--
--If we want to return information about the manager and his employees
--in the same query, we must use a self-join or reflexive join
--

SELECT	e.first_name, 
		e.last_name, 
		m.first_name AS manager_first_name, 
		m.last_name AS manager_last_name 
FROM employees e 
INNER JOIN employees m 
ON e.manager_id = m.employee_id;

The first query is a simple query to return the manager with employee id 103's first name and last name. The next query shows the names of the people that are managed by this manager. The third query is a reflexive query in which the employee table is joined to itself in order to return each employee's name and their manager name in the same row.

Hierarchical Queries

A special type of self join is available in Oracle called a hierarchical query. This type of query is useful for representing recursive relationships which might exist within the data.

Code Sample:

Joining-Tables/Demos/hierarchical_query.sql
SELECT employee_id, manager_id
FROM employees
START WITH manager_id is null
CONNECT BY PRIOR employee_id = manager_id;

SELECT employee_id, manager_id, sys_connect_by_path(last_name,'->') AS path
FROM employees
START WITH manager_id is null
CONNECT BY PRIOR employee_id = manager_id;

The first query produces hierarchical query results by using the CONNECT BY PRIOR clause. It orders the results of the employee table so that employees are nested under their assigned managers. The next query uses the SYS_CONNECT_BY_PATH function to provide a representation of the hierarchy using last names.

Non-Key Join

Joins are usually specified on primary key and foreign key columns. However, SQL is not limited to joining on such columns. Any columns can be joined as long as the datatypes match. In fact, even a function can be used as part of join.

Just for fun let's see if the first two characters of an employee's first name happen to match any country code; uppercase those two characters since country codes are uppercased.

Code Sample:

Joining-Tables/Demos/nonkey_join.sql
SELECT country_id, country_name, first_name
FROM employees e
INNER JOIN countries c ON c.country_id = upper(substr(e.first_name,1,2));

This is a somewhat contrived example since the HR schema is well designed and relatively simple. It involves the use of functions in the joined column. Be aware, there is a potential for performance problems if appropriate indexes have not been defined. Oracle has a feature called function based indexes which can be helpful when using deterministic functions in this manner under some circumstances.

Natural Joins

Code Sample:

Joining-Tables/Demos/natural_join.sql
--Query using INNER JOIN syntax
SELECT * 
FROM regions r 
INNER JOIN countries c ON c.region_id = r.region_id;

--Same rows return using NATURAL JOIN syntax (***NOT RECOMENDED***)
SELECT * 
FROM regions 
NATURAL JOIN countries;

A natural join involves tables joined by columns with the same name. The NATURAL JOIN clause implicitly creates a join on columns with the same column name between the joined tables. The returned results (in the case of a SELECT *) contain only one column for each pair of equally-named columns.

Although it is useful to be aware of NATURAL JOIN syntax, it is best to avoid its use in practice. At best, the columns being used to make the join are not explicitly identified, and this ambiguity can lead to confusion when attempting to debug a query. If a column is renamed, a query referencing the column in a NATURAL JOIN will remain valid, however the results of the query will change.

Semijoins and Antijoins

A semi-join returns rows from a table where matches are found in the joined table. It uses the IN or EXISTS operators. An anti-join returns rows from a table for which there are no corresponding rows in the joined table. It uses the NOT IN or NOT EXISTS keywords.

For example, list all of the job ids for all of the jobs that have ever been assigned to an employee.

Code Sample:

Joining-Tables/Demos/semijoin.sql
SELECT * 
FROM jobs 
WHERE job_id IN (
	SELECT job_id 
	FROM job_history);

This example is essentially the same as an inner query. However, it provides a different type of flexibility unavailable to an inner join by adding criteria to the WHERE clause of the subquery.

Now list all of the job ids for all of the jobs that have never been assigned to an employee.

Code Sample:

Joining-Tables/Demos/antijoin.sql
SELECT * 
FROM jobs 
WHERE job_id NOT IN (
	SELECT job_id 
	FROM job_history
);

This is simply the inverse of the previous example.

Using Named Subqueries

A named subquery is also known as inline view. It can be joined in the same manner as a regular table.

Code Sample:

Joining-Tables/Demos/named_subquery.sql
--
--  The following query returns a list of employee managers as well as 
--  the department id to which each is assigned. Note that the employee 
--  managers utilize the self relationship in the EMPLOYEES table. Also 
--  note that an employee manager is different than a department manager.
--
SELECT first_name, last_name, department_id 
FROM employees WHERE 
manager_id IS NULL 
OR employee_id IN 
	(SELECT manager_id FROM employees);

--
-- This query can be used as a named subquery with an alias of managers.
-- It can then be joined to the departments table to see a list of each
-- employee manager and the name of the department to which each is assigned.
--
SELECT 	managers.first_name, 
	managers.last_name, 
	d.department_name
FROM
(
	SELECT first_name, last_name, department_id 
	FROM employees WHERE 
	manager_id IS NULL 
	OR employee_id IN (
		SELECT manager_id 
		FROM employees
	)
	) managers 
INNER JOIN departments d ON d.department_id = managers.department_id;

This example starts with a query that makes sense for use as an inline view. It is a listing of all employee managers. This query is then used in the second query in the place of a table and is joined with the departments table to provide the final set of results.

Join Exercises

Duration: 10 to 15 minutes.
  1. Create a query that returns all employee rows. Assign an alias e to the table.
  2. Modify the query so that it accesses both the employees and departments table (using the Theta syntax, but "accidentally" forget the join predicate.). Make sure to add the alias e to the asterisk in the SELECT statement.
  3. Modify the query so that it uses an inner join between the two tables using the department_id columns (thus utilizing the department assignment relationship). Note that this is a natural join (but does not use the ambiguous NATURAL JOIN syntax).
  4. Rewrite the query so that it returns the same rows but uses a semijoin.

Solution:

Joining-Tables/Solutions/join_solutions.sql
--
-- All employee records
--
SELECT * 
FROM employees e;

--
-- This is a cross-join which the cartesian product of the two tables.
--
SELECT e.*, d.department_id 
FROM employees e, departments d;

--
-- An inner join
--
SELECT e.* 
FROM employees e 
INNER JOIN departments d ON e.department_id = d.department_id;

--
-- The same results from a semijoin
--
SELECT e.* 
FROM employees e 
WHERE e.department_id IN ( 
	SELECT d.department_id
	FROM departments d
);