Often times during development, we come across requirements which ends up producing complex SQL statements. One way to make complex query more readable and manageable is by making use of views.
One issue with this is that, view stores the schema definition permanently in the database.
So if the result a view produces will not be used in other procedures, then creating a view would be a bad design choice.
Instead of creating a view, we could also make use of temporary result sets such as a temporary table or a derived table. The issue with using temporary table is that it’s a multi-step approach.
First you would need to define a temporary table, then you would need to populate the temporary table and after using the temporary table, you need to drop the temporary table.
With a derived table, although it has limited scope and disappears after the query has been completed, it affects readability of a query.
A better solution in this case is to make use of Common Table Expression (CTE).
CTE improves readability and makes a complex query more manageable. It is similar to a derived table in that its scope is also limited to a single query.
But unlike a derived table, it has the ability to run recursive queries by referencing itself multiple times within the same query.
SYNTAX:
WITH
CTE_Name[Column1, Column2 …]
AS
(
CTE_Definition
)
OuterQuery_Definition
You can divide the syntax into three parts:
WITH Clause
Definition of a CTE needs to be defined within a WITH clause. You can define multiple CTE’s within a single WITH clause.
It is important to note here that the query preceding the WITH clause needs to be terminated using a semicolon ”;”.
CTE Name and Definition
Name of the CTE is used to identify itself. If there exists multiple CTE’s within a WITH clause, then each of them must have unique names.
CTE name is followed by optional column names. If column names are not specified, CTE will use the column names from the SELECT clause in the CTE definition.
CTE definition is simply one or more SELECT statements which populates the CTE.
Outer Query
This is the query which immediately follows the CTE definition. If there exists multiple CTE definition’s within a WITH clause, then Outer query will appear after defining all the CTE’s.
There are two types of CTE in SQL SERVER:
- Non recursive CTE
- Recursive CTE
Non Recursive CTE
Non recursive CTE’s are simply CTE’s which does not reference itself.
Before analyzing an example using CTE, lets first look into a query which does not use CTE.
For example, you have 3 tables in your database: Customer, CustomerProduct and Product.
Customer table contains details of all customers, CustomerProduct table contains the product each customer has bought and Product table contains details of each product.
The requirement is that, we need to find total amount spent by customers who are at least 30 years old and has bought products worth $200 or more.
So without using CTE, we might create a query like this:
SELECT C.Name, C.Age, C.City, C.State, A.TotalSpent
FROM Customer AS C
INNER JOIN (
SELECT CP.CustomerID, SUM(P.Price)
AS TotalSpent
FROM CustomerProduct AS CP
INNER JOIN Product AS P
ON CP.ProductID = P.ProductID
WHERE P.Price > 200
GROUP BY CP.CustomerID
) AS A ON C.CustomerID = A.CustomerID
WHERE C.Age > 30
The above query makes use of a derived table. The derived table in this case returns list of customer id and the total amount spent by each customer on products worth more than $200.
The main query joins the derived table, filters the customers who are older than 30 years and finally selects the customer information and the total amount spent by the customer.
As you can see, the more complex derived table we have, the more it affects the readability. A better approach is to make use of CTE as shown below.
WITH
AmountSpent(CustomerID, TotalSpent)
AS
(
SELECT CP.CustomerID, SUM(P.Price)
FROM CustomerProduct AS CP
INNER JOIN Product AS P
ON CP.ProductID = P.ProductID
WHERE P.Price > 200
GROUP BY CP.CustomerID
)
SELECT C.Name, C.Age, C.City, C.State, A.TotalSpent
FROM AmountSpent AS A
INNER JOIN Customer AS C
ON C.CustomerID = A.CustomerID
WHERE C.Age > 30
In the above query, AmountSpent is the name of the CTE. Following the CTE name, we have CustomerID and TotalSpent which are the column names for the CTE.
CTE definition in the above query is a SELECT statement which returns list of customer id and the total amount spent by each customer on products worth more than $200.
This definition is similar to the derived table we saw in the previous example. The output of this query gets inserted into the CTE AmountSpent.
Next we have an outer query, which joins the customer table with the CTE AmountSpent,
filters the customers who are older than 30 years and finally selects the customer information and the total amount spent by the customer.
You can also add multiple CTE definitions within the WITH clause,
and the final query will be able to reference all of them making the query much more readable compared to having multiple derived tables within a query.
Recursive CTE
As the name suggests, recursive CTE provides the ability to reference itself multiple times. It is typically used to obtain hierarchical or tree structured data.
SYNTAX:
WITH
CTE_Name[Column1, Column2 …]
AS
(
AnchorQuery_Definition
UNION ALL
RecursiveQuery_Definition
)
OuterQuery_Definition
Definition of Recursive CTE can been divided in to two parts.
Anchor Query
Anchor query is the one which gets invoked only once and does not reference the CTE.
Recursive Query
Recursive query on the other hand references the CTE and will keep getting executed multiple times until it returns an empty data set.
Let’s take a look at an example. Suppose we are interested in finding hierarchy of managers for a particular employee, we can make use of recursive CTE as shown below.
WITH
EmployeeManager (EmployeeID, ManagerID)
AS
(
SELECT EmployeeID, ManagerID
FROM Employee
WHERE EmployeeID = 10
UNION ALL
SELECT E.EmployeeID, E.ManagerID
FROM Employee AS E
INNER JOIN EmployeeManager AS M
ON E.EmployeeID = M.ManagerID
)
SELECT E.EmployeeID, E.FirstName + ' ' + E.LastName
AS Name, E.Title, E.ManagerID
FROM EmployeeManager AS M
INNER JOIN Employee AS E
ON M.EmployeeID = E.EmployeeID
In the above query, EmployeeManager is the name of the CTE. It has two columns, EmployeeID and ManagerID. Anchor query is the first query in the CTE definition and is the one which gets executed first.
It selects EmployeeID and ManagerID from Employee table. The query inserts a single row with EmployeeID 10 in to the CTE.
Now that we have the primary employee record in the CTE, what we need now is to find the manager of this employee and insert the manager record in the CTE.
To achieve this, the second query which is the recursive query in the CTE gets executed.
The recursive query references the existing record in the CTE and selects the EmployeeID and ManagerID of the manager of the primary employee.
At this point, the CTE would have two records, the first record is the primary employee with EmployeeID 10 and the second record is the immediate manager of the primary employee.
Since the recursive query did not return an empty data set, the recursive query again executes by referencing the CTE (which now has 2 records) and inserts managers of the existing records in the CTE.
This continues until the recursive query inserts all the managers in the hierarchy in to the CTE.
Once the CTE has been completely populated, the control goes to the outer query, which references the CTE and displays details of those employees in the CTE.