Streaming Uploads
The memory problem revisited
The upload route from the earlier lesson collects all chunks into a Buffer before validating and saving. This works for small files but fails for large ones — a 500 MB file needs 500 MB of RAM.
The fix: pipe the file stream directly to disk while validating on the fly.
Pipe from request to disk
import { createWriteStream } from "node:fs";
import { pipeline } from "node:stream/promises";
import { join } from "node:path";
import { UPLOAD_DIR } from "../storage.js";
route.post("/files", {
resolve: async (c) => {
const userId = "user-alice"; // Replace with real auth
const parsed = await parseStreamingUpload(c.request, userId);
if (parsed instanceof Response) return parsed;
return Response.json(
{
id: parsed.id,
name: parsed.originalName,
mimeType: parsed.mimeType,
size: parsed.size,
url: `/files/${parsed.id}`,
},
{ status: 201 },
);
},
});
async function parseStreamingUpload(
request: Request,
userId: string,
): Promise<{ id: string; originalName: string; mimeType: string; size: number } | Response> {
const contentType = request.headers.get("content-type");
if (!contentType?.includes("multipart/form-data")) {
return Response.json({ error: "Expected multipart/form-data" }, { status: 400 });
}
return new Promise((resolve) => {
const busboy = Busboy({
headers: { "content-type": contentType },
limits: { fileSize: 100 * 1024 * 1024, files: 1 }, // 100 MB
});
const fields: Record<string, string> = {};
let fileTruncated = false;
let savedFile: {
id: string;
originalName: string;
storedName: string;
mimeType: string;
size: number;
} | null = null;
busboy.on("field", (name, value) => {
fields[name] = value;
});
busboy.on("file", (fieldName, stream, info) => {
const storedName = `${crypto.randomUUID()}${extname(info.filename).toLowerCase()}`;
const filePath = join(UPLOAD_DIR, storedName);
const writeStream = createWriteStream(filePath);
let bytesWritten = 0;
stream.on("data", (chunk: Buffer) => {
bytesWritten += chunk.length;
});
stream.on("limit", () => {
fileTruncated = true;
writeStream.destroy();
// Delete the partial file
try {
unlinkSync(filePath);
} catch {}
});
stream.pipe(writeStream);
stream.on("end", () => {
if (!fileTruncated) {
savedFile = {
id: crypto.randomUUID(),
originalName: info.filename,
storedName,
mimeType: info.mimeType,
size: bytesWritten,
};
}
});
});
busboy.on("close", () => {
if (fileTruncated) {
resolve(Response.json({ error: "File exceeds size limit" }, { status: 413 }));
return;
}
if (!savedFile) {
resolve(Response.json({ error: "No file uploaded" }, { status: 400 }));
return;
}
// Record in database
db.prepare(
"INSERT INTO files (id, user_id, original_name, stored_name, mime_type, size) VALUES (?, ?, ?, ?, ?, ?)",
).run(
savedFile.id,
userId,
savedFile.originalName,
savedFile.storedName,
savedFile.mimeType,
savedFile.size,
);
resolve(savedFile);
});
// Pipe request body to busboy
const body = request.body;
if (!body) {
resolve(Response.json({ error: "No body" }, { status: 400 }));
return;
}
const reader = body.getReader();
const nodeStream = new Readable({
async read() {
const { done, value } = await reader.read();
if (done) this.push(null);
else this.push(Buffer.from(value));
},
});
nodeStream.pipe(busboy);
});
} The key difference: stream.pipe(writeStream) sends file data directly from the network to the disk. At any point, only one chunk (~64 KB) is in memory. A 500 MB file uses ~64 KB of RAM.
Validating during the stream
MIME type validation needs the first few bytes. Buffer just the first chunk:
busboy.on("file", (fieldName, stream, info) => {
let firstChunk: Buffer | null = null;
stream.on("data", (chunk: Buffer) => {
if (!firstChunk) {
firstChunk = chunk;
// Validate MIME type from first chunk
// If invalid, destroy the stream
}
});
}); For full MIME validation, you can intercept the first chunk, check the magic bytes, and only then start piping to disk. If validation fails, destroy the write stream and delete any partial file.
Exercises
Exercise 1: Upload a 50 MB file (create one with dd if=/dev/zero of=big.bin bs=1M count=50). Monitor memory usage during the upload. It should stay constant (not grow by 50 MB).
Exercise 2: Upload a file that exceeds the size limit. Verify the partial file is cleaned up from disk.
Exercise 3: Compare memory usage between the buffered approach (collect all chunks, then save) and the streaming approach (pipe to disk) for a 50 MB file.
How much memory does a streaming upload use for a 500 MB file?