Subqueries and CTEs

JOINs and aggregation cover most queries you will ever write. But some questions need a query inside a query. “Which books have more reviews than average?” You have to compute the average first, then filter by it. “Which authors have at least one five-star review?” You have to check related data that lives across multiple tables. These layered questions are where subqueries and CTEs come in.

In this lesson we will start with subqueries, because they are the simpler tool and show up in almost every codebase. Then we will move to Common Table Expressions (CTEs), which let you break complex logic into named, readable steps. By the end you will have a clear sense of when to reach for each one.

Subqueries

A subquery is exactly what it sounds like: a query nested inside another query. The inner query runs first, and its result feeds the outer query. There are three common shapes.

Scalar subquery returns one value.

Code along

-- The most recently published book
SELECT title, published_at
FROM books
WHERE published_at = (SELECT MAX(published_at) FROM books);

Trace through this one. The inner SELECT MAX(published_at) FROM books runs first and returns a single date (the latest publication date). The outer query then finds the book whose published_at equals that date. The inner query produced exactly one value, which is why we call it “scalar.”

IN subquery returns a list of values.

Code along

-- Books by authors who have more than 2 books
SELECT title
FROM books
WHERE author_id IN (
  SELECT author_id
  FROM books
  GROUP BY author_id
  HAVING COUNT(*) > 2
);

Here the inner query returns a list of author_id values (the ones who have written more than two books). The outer query uses WHERE author_id IN (...) to find every book whose author appears in that list. If the inner query returns five author ids, the outer query checks each book against those five.

EXISTS subquery checks whether any rows match.

Code along

-- Authors who have at least one 5-star review on any book
SELECT name
FROM authors
WHERE EXISTS (
  SELECT 1
  FROM books
  JOIN reviews ON reviews.book_id = books.id
  WHERE books.author_id = authors.id AND reviews.rating = 5
);

EXISTS returns true as soon as the inner query finds at least one matching row. Notice how the inner query references authors.id from the outer query. That makes this a correlated subquery: the inner query runs once per outer row, each time checking “does this author have any 5-star reviews?” EXISTS is often faster than IN because it can stop scanning the moment it finds a single match.

Common Table Expressions (CTEs)

Subqueries work well for one or two levels of nesting. But once you start stacking them three or four deep, queries become painful to read. You have to start from the innermost part and work outward, which is backwards from how we naturally read code.

CTEs solve that. A CTE is a named temporary result set defined with the WITH keyword. You name each step, then compose the steps top to bottom. Compare these two versions of the same query:

Code along

-- Without CTE: nested and hard to follow
SELECT title, avg_rating
FROM books
JOIN (
  SELECT book_id, AVG(rating) AS avg_rating
  FROM reviews
  GROUP BY book_id
) AS book_ratings ON books.id = book_ratings.book_id
WHERE avg_rating >= 4.0;

-- With CTE: named steps, clear flow
WITH book_ratings AS (
  SELECT book_id, AVG(rating) AS avg_rating
  FROM reviews
  GROUP BY book_id
)
SELECT books.title, book_ratings.avg_rating
FROM books
JOIN book_ratings ON books.id = book_ratings.book_id
WHERE book_ratings.avg_rating >= 4.0;

The WITH book_ratings AS (...) block defines a temporary result set called book_ratings. The main query that follows can treat book_ratings like any other table. Worth noting: the CTE is not stored anywhere on disk. It only exists for the duration of that one query.

Both versions return the same result. But read them out loud. The CTE version tells a story: “First, compute book ratings. Then, find books with high ratings.” The nested version makes you parse inside-out before you understand what it is doing. On a team, the CTE version saves your reviewers a lot of time.

Multiple CTEs

You can also chain CTEs together for multi-step analysis. Each CTE can reference the ones defined before it:

Code along

WITH
  -- Step 1: Calculate review stats per book
  book_stats AS (
    SELECT
      book_id,
      AVG(rating) AS avg_rating,
      COUNT(*) AS review_count
    FROM reviews
    GROUP BY book_id
  ),
  -- Step 2: Find popular books (high rating + many reviews)
  popular_books AS (
    SELECT book_id
    FROM book_stats
    WHERE avg_rating >= 4.0 AND review_count >= 3
  )
-- Step 3: Get details for popular books
SELECT books.title, authors.name AS author_name, bs.avg_rating, bs.review_count
FROM popular_books pb
JOIN books ON pb.book_id = books.id
JOIN authors ON books.author_id = authors.id
JOIN book_stats bs ON pb.book_id = bs.book_id
ORDER BY bs.avg_rating DESC;

Notice how readable this is. Step 1 computes review stats per book. Step 2 uses those stats to pick out popular books. Step 3 joins back to the book and author details to produce the final output. Each step has a name that describes what it does, and the whole query reads top to bottom. That is exactly how production analytics queries get written.

When to use what

Three options, three questions to help you pick.

Subquery: Good for simple filters like WHERE id IN (SELECT ...). Gets messy when nested deeply.

CTE: Good for complex queries with multiple logical steps. Readable and self-documenting. Reach for this whenever the query has two or more distinct steps.

JOIN: Good for combining related tables in the result. The primary way to pull data from multiple tables into one output.

Rules of thumb: if you need one query’s result as a filter for another, use a subquery or a CTE. If the query has multiple logical steps, use a CTE. If you need data from multiple tables in the output, use a JOIN. And if a query is getting hard to read, a CTE will almost always make it clearer.

Exercises

Exercise 1: Write a subquery to find books by the most prolific author (the author with the most books).

Exercise 2: Rewrite the same query as a CTE. Which version is more readable?

Exercise 3: Write a CTE that finds the top-rated book in each genre.

Exercise 4: Write a multi-CTE query: find authors whose books have an average rating above 4.0, then list their books sorted by rating.

What is the main advantage of CTEs over subqueries?

Our queries can now answer sophisticated questions: JOINs combine tables, aggregation gives us summaries, and CTEs let us build multi-step analysis that still reads cleanly. But on large tables, queries like these can get slow. A query that runs in milliseconds on a hundred rows can take seconds on a million. In the next section, we will look at indexes, query plans, transactions, and the SQLite settings that every real application needs.

Access Required

Subqueries and CTEs

Subqueries

Common Table Expressions (CTEs)

Multiple CTEs

When to use what

Exercises