Caching Computed Results

Beyond raw queries

Some cached data is not a direct database query — it is a computed result that requires multiple queries, transformations, or aggregations. Computing it is expensive. The result changes infrequently. Caching it avoids repeating the computation.

Average ratings

The book detail endpoint computes the average rating with a subquery:

(SELECT AVG(rating) FROM reviews WHERE book_id = books.id) AS avg_rating

This runs on every request. An alternative: compute it once and cache it.

async function getBookWithRating(bookId: string) {
  return cacheThrough(`book-with-rating:${bookId}`, 5 * 60_000, () => {
    const book = db
      .prepare(
        "SELECT books.*, authors.name AS author_name FROM books JOIN authors ON books.author_id = authors.id WHERE books.id = ?",
      )
      .get(bookId) as any;
    if (!book) return null;

    const stats = db
      .prepare(
        "SELECT AVG(rating) AS avg_rating, COUNT(*) AS review_count FROM reviews WHERE book_id = ?",
      )
      .get(bookId) as any;

    return { ...book, avgRating: stats.avg_rating, reviewCount: stats.review_count };
  });
}

Two queries run once, the combined result is cached for 5 minutes.

[!NOTE] The Database Design course’s When to Denormalize lesson solved this with denormalized review_count and avg_rating columns on the books table, updated by triggers. Caching is an alternative: instead of storing computed values in the database, store them in memory. Both approaches avoid recomputing on every request. Denormalization is better when you have many different queries that need the value. Caching is better when only a few endpoints use it.

Leaderboards and rankings

A “most reviewed books” leaderboard requires counting reviews, sorting, and limiting. This is a multi-table aggregation:

async function getMostReviewed() {
  return cacheThrough("leaderboard:most-reviewed", 10 * 60_000, () =>
    db
      .prepare(
        `
      SELECT books.id, books.title, authors.name AS author_name,
             COUNT(reviews.id) AS review_count
      FROM books
      JOIN authors ON books.author_id = authors.id
      JOIN reviews ON reviews.book_id = books.id
      GROUP BY books.id
      ORDER BY review_count DESC
      LIMIT 20
    `,
      )
      .all(),
  );
}

Leaderboards are perfect cache candidates: expensive to compute, requested frequently, and change slowly (only when new reviews are posted).

Aggregated statistics

Dashboard-style endpoints that summarize data:

async function getCatalogStats() {
  return cacheThrough("catalog-stats", 15 * 60_000, () => {
    const bookCount = (db.prepare("SELECT COUNT(*) AS c FROM books").get() as any).c;
    const authorCount = (db.prepare("SELECT COUNT(*) AS c FROM authors").get() as any).c;
    const reviewCount = (db.prepare("SELECT COUNT(*) AS c FROM reviews").get() as any).c;
    const avgRating = (db.prepare("SELECT AVG(rating) AS avg FROM reviews").get() as any).avg;

    return {
      books: bookCount,
      authors: authorCount,
      reviews: reviewCount,
      averageRating: avgRating ? Math.round(avgRating * 10) / 10 : null,
    };
  });
}

Four queries run once, the summary is cached for 15 minutes. A dashboard that polls this endpoint every 5 seconds makes 180 requests in 15 minutes — only 1 hits the database.

Precomputing on a schedule

For expensive computations, precompute in a background job instead of on first request:

// Background job (from the Background Jobs course)
const handlers = {
  compute_leaderboard: async () => {
    const leaderboard = db.prepare("SELECT ...").all();
    cacheSet("leaderboard:most-reviewed", leaderboard, 15 * 60_000);
    console.log("[CACHE] Leaderboard precomputed");
  },
};

// Cron: recompute every 10 minutes
registerCron("compute_leaderboard", "*/10 * * * *");

[!NOTE] This combines the Background Jobs course’s cron scheduling with caching. The leaderboard is always warm — no user ever triggers the expensive computation. The first visitor sees precomputed data, not a cache miss.

Exercises

Exercise 1: Cache a computed result (book with rating stats). Verify the two queries only run once per TTL window.

Exercise 2: Create a catalog stats endpoint. Cache it for 15 minutes. Call it 100 times. Verify only 1 database query.

Exercise 3: Precompute the leaderboard with a background job. Verify the endpoint never triggers a database query (always cache hit).

When is caching a better choice than denormalization for computed values?