Sorting, grouping, and aggregates

So far, when we run a SELECT, the rows come back in whatever order SQLite feels like returning them. That is fine for quick checks, but in a real application you almost always need the results sorted. You also often need summaries. How many rows match? What is the total? What is the average?

This lesson covers the tools that transform raw rows into useful information. These are the building blocks behind every dashboard, leaderboard, and report you have ever seen.

Sorting with ORDER BY

To control the order of results, add ORDER BY:

Code along

SELECT * FROM users ORDER BY name;

This sorts alphabetically by name, ascending (A to Z). ASC (ascending) is the default, so you rarely need to write it explicitly.

To reverse the order, add DESC (descending):

Code along

SELECT * FROM users ORDER BY created_at DESC;

This shows the most recently created users first. You will use this pattern constantly. Any time you want “newest first,” it is ORDER BY ... DESC.

You can sort by multiple columns. SQLite sorts by the first column, then breaks ties with the second:

Code along

SELECT * FROM users ORDER BY name ASC, created_at DESC;

This sorts by name alphabetically. When two users have the same name, the one created more recently comes first. The second sort column only matters when there are ties in the first.

Aggregate functions

Aggregate functions compute a single value from a set of rows. You have already seen count(*). Here are the most common ones:

Code along

-- Count all rows
SELECT count(*) FROM users;

-- Count rows where a specific column is not NULL
SELECT count(bio) FROM users;

-- Minimum and maximum
SELECT min(created_at), max(created_at) FROM users;

-- Sum and average (useful with numeric columns)
SELECT sum(id), avg(id) FROM users;

Let’s look at a subtle but important difference. count(*) counts all rows, regardless of what is in them. count(bio) only counts rows where bio is not NULL. If five users exist but only two have filled in their bio, count(*) returns 5 and count(bio) returns 2. The distinction matters when your data has gaps.

sum() and avg() are most meaningful with numeric data like prices, quantities, or scores. Using them on id here is just for illustration. You will use them naturally once you have tables with numeric columns.

GROUP BY

GROUP BY is where things get interesting. It splits rows into groups and applies aggregate functions to each group separately.

Our schema only has users so far, so let’s group on something that users actually offers: whether or not a user has filled in their bio. A CASE expression turns that question into a simple label we can group by.

Code along

SELECT
  CASE WHEN bio IS NULL THEN 'no bio' ELSE 'has bio' END AS status,
  count(*) AS user_count
FROM users
GROUP BY status;

Here is what happens step by step:

SQLite looks at every row in users.
For each row, it evaluates the CASE expression, which produces either 'no bio' or 'has bio'.
It groups rows by that value. All users with 'no bio' form one group, all users with 'has bio' form another.
For each group, it runs the aggregate function. In this case, it counts the rows.
The result has one row per group.

status   user_count
-------  ----------
has bio  4
no bio   2

The AS status and AS user_count parts give each computed column a readable name. Without them, the columns would be labeled with the raw expressions, which works but is harder to read.

HAVING

Here is a question. What if you only want to see groups that have more than 3 users? You might think to use WHERE, but WHERE filters individual rows before grouping. The count does not exist yet at that point.

That is what HAVING is for. It filters groups after they have been formed:

Code along

SELECT
  CASE WHEN bio IS NULL THEN 'no bio' ELSE 'has bio' END AS status,
  count(*) AS user_count
FROM users
GROUP BY status
HAVING user_count > 3;

This returns only groups where the count is greater than 3. Think of it this way. WHERE decides which rows go into the groups. HAVING decides which groups appear in the final result.

Putting it all together

A SELECT statement can use all of these clauses together. The order is always:

SELECT columns
FROM table
WHERE row_conditions
GROUP BY grouping_columns
HAVING group_conditions
ORDER BY sort_columns
LIMIT count OFFSET skip;

This order is not a suggestion. It is required by SQL syntax. Rearranging them causes a syntax error.

Here is a realistic example that uses everything:

Code along

SELECT
  CASE WHEN bio IS NULL THEN 'no bio' ELSE 'has bio' END AS status,
  count(*) AS user_count
FROM users
WHERE created_at > '2025-01-01'
GROUP BY status
HAVING user_count >= 2
ORDER BY user_count DESC
LIMIT 10;

Let’s read it like a sentence: “From users created after the start of 2025, group them by whether they have a bio, keep only groups with 2 or more users, sort by user count descending, and return the top 10.”

That single query answers the question “among recent signups, how many have filled in a bio?” This is the kind of thing SQL is built for. Once we add more tables in Section 5, the same GROUP BY / HAVING machinery will answer much richer questions like “which books have the most reviews” or “which authors have written the most fiction.” The mechanics you just saw are the same.

[!NOTE] The clause order matters. WHERE comes before GROUP BY, and HAVING comes after it. ORDER BY is always near the end, with LIMIT last. If you ever get a syntax error and the SQL looks correct, check the clause order.

Exercises

Exercise 1: Sort your users by created_at in descending order. Then sort by name ascending, with ties broken by created_at descending.

Exercise 2: Use count(*) and count(bio) on the users table. If any users have NULL bios, the numbers will differ. Understand why.

Exercise 3: Write a query that counts users grouped by whether they have a bio or not. Hint: you can use CASE WHEN bio IS NULL THEN 'no bio' ELSE 'has bio' END as a column expression and group by it.

What is the difference between WHERE and HAVING?

We can now insert, query, filter, sort, and summarize data. But what about changing data that already exists? In the next lesson, we will learn how to update and delete rows.

Access Required

Sorting, grouping, and aggregates

Sorting with ORDER BY

Aggregate functions

GROUP BY

HAVING

Putting it all together

Exercises