Streaming Uploads

The memory problem revisited

The upload route from the earlier lesson collects all chunks into a Buffer before validating and saving. This works for small files but fails for large ones — a 500 MB file needs 500 MB of RAM.

The fix: pipe the file stream directly to disk while validating on the fly.

Pipe from request to disk

import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
import { join } from "node:path";
import { UPLOAD_DIR } from "../storage.js";

route.post("/files", {
  resolve: async (c) => {
    const userId = "user-alice"; // Replace with real auth

    const parsed = await parseStreamingUpload(c.request, userId);
    if (parsed instanceof Response) return parsed;

    return Response.json(
      {
        id: parsed.id,
        name: parsed.originalName,
        mimeType: parsed.mimeType,
        size: parsed.size,
        url: `/files/${parsed.id}`,
      },
      { status: 201 },
    );
  },
});

async function parseStreamingUpload(
  request: Request,
  userId: string,
): Promise<{ id: string; originalName: string; mimeType: string; size: number } | Response> {
  const contentType = request.headers.get("content-type");
  if (!contentType?.includes("multipart/form-data")) {
    return Response.json({ error: "Expected multipart/form-data" }, { status: 400 });
  }

  return new Promise((resolve) => {
    const busboy = Busboy({
      headers: { "content-type": contentType },
      limits: { fileSize: 100 * 1024 * 1024, files: 1 }, // 100 MB
    });

    const fields: Record<string, string> = {};
    let fileTruncated = false;
    let savedFile: {
      id: string;
      originalName: string;
      storedName: string;
      mimeType: string;
      size: number;
    } | null = null;

    busboy.on("field", (name, value) => {
      fields[name] = value;
    });

    busboy.on("file", (fieldName, stream, info) => {
      const storedName = `${crypto.randomUUID()}${extname(info.filename).toLowerCase()}`;
      const filePath = join(UPLOAD_DIR, storedName);
      const writeStream = createWriteStream(filePath);
      let bytesWritten = 0;

      stream.on("data", (chunk: Buffer) => {
        bytesWritten += chunk.length;
      });

      stream.on("limit", () => {
        fileTruncated = true;
        writeStream.destroy();
        // Delete the partial file
        try {
          unlinkSync(filePath);
        } catch {}
      });

      stream.pipe(writeStream);

      stream.on("end", () => {
        if (!fileTruncated) {
          savedFile = {
            id: crypto.randomUUID(),
            originalName: info.filename,
            storedName,
            mimeType: info.mimeType,
            size: bytesWritten,
          };
        }
      });
    });

    busboy.on("close", () => {
      if (fileTruncated) {
        resolve(Response.json({ error: "File exceeds size limit" }, { status: 413 }));
        return;
      }

      if (!savedFile) {
        resolve(Response.json({ error: "No file uploaded" }, { status: 400 }));
        return;
      }

      // Record in database
      db.prepare(
        "INSERT INTO files (id, user_id, original_name, stored_name, mime_type, size) VALUES (?, ?, ?, ?, ?, ?)",
      ).run(
        savedFile.id,
        userId,
        savedFile.originalName,
        savedFile.storedName,
        savedFile.mimeType,
        savedFile.size,
      );

      resolve(savedFile);
    });

    // Pipe request body to busboy
    const body = request.body;
    if (!body) {
      resolve(Response.json({ error: "No body" }, { status: 400 }));
      return;
    }

    const reader = body.getReader();
    const nodeStream = new Readable({
      async read() {
        const { done, value } = await reader.read();
        if (done) this.push(null);
        else this.push(Buffer.from(value));
      },
    });

    nodeStream.pipe(busboy);
  });
}

The key difference: stream.pipe(writeStream) sends file data directly from the network to the disk. At any point, only one chunk (~64 KB) is in memory. A 500 MB file uses ~64 KB of RAM.

Validating during the stream

MIME type validation needs the first few bytes. Buffer just the first chunk:

busboy.on("file", (fieldName, stream, info) => {
  let firstChunk: Buffer | null = null;

  stream.on("data", (chunk: Buffer) => {
    if (!firstChunk) {
      firstChunk = chunk;
      // Validate MIME type from first chunk
      // If invalid, destroy the stream
    }
  });
});

For full MIME validation, you can intercept the first chunk, check the magic bytes, and only then start piping to disk. If validation fails, destroy the write stream and delete any partial file.

Exercises

Exercise 1: Upload a 50 MB file (create one with dd if=/dev/zero of=big.bin bs=1M count=50). Monitor memory usage during the upload. It should stay constant (not grow by 50 MB).

Exercise 2: Upload a file that exceeds the size limit. Verify the partial file is cleaned up from disk.

Exercise 3: Compare memory usage between the buffered approach (collect all chunks, then save) and the streaming approach (pipe to disk) for a 50 MB file.

How much memory does a streaming upload use for a 500 MB file?

Access Required

Streaming Uploads

The memory problem revisited

Pipe from request to disk

Validating during the stream

Exercises