How We Stream 1 Million SQL Rows using Next.js and Web Streams
Technical deep dive into MockBlast's streaming architecture using Next.js App Router and the Web Streams API to generate and deliver millions of SQL rows efficiently.
The Challenge: Generating Millions of Rows
When building MockBlast, we faced a fundamental problem: how do you let users generate millions of rows without:
- ❌Running out of memory: Buffering 1M rows on the server would consume gigabytes of RAM.
- ❌Making users wait: Generating all rows before sending any would create long delays.
- ❌Timeout errors: Long-running requests would hit API Gateway or browser timeouts.
- ❌Blocking the server: One large request shouldn't block other users.
The answer: streaming.
The Architecture: Web Streams + Generator Functions
1. Generator Function for Row Production
// Pseudo-code: Generator function for row production
async generator generateRows(schema, count):
batchSize = 1000
for each batch:
generate batchSize rows
format as SQL INSERT statement
yield batch to stream
yield control to event loop (non-blocking)
continue until all rows generated2. ReadableStream for HTTP Streaming
// Pseudo-code: Next.js Route Handler
POST /api/generate:
create ReadableStream
for each chunk from generator:
encode chunk as text
enqueue to stream
stream sends to client immediately
close stream when done
return Response with stream
headers: chunked transfer encoding3. Client-Side Stream Consumption
// Pseudo-code: Client-side stream consumption
function downloadSQL(schema, rowCount):
fetch streaming endpoint
get stream reader
while stream has data:
read chunk
decode chunk
append to SQL string
update progress bar
when stream complete:
download SQL file
Benefits:
- Immediate feedback
- Progressive download
- Memory efficientKey Optimizations
🎯 Batching Strategy
Generate rows in batches of 1000:
- • Reduces function call overhead
- • Amortizes SQL formatting costs
- • Balances memory vs. throughput
- • 1000 rows = ~100KB chunks
⚡ Event Loop Yielding
Prevent blocking the event loop:
// Pseudo-code: Event loop yielding yield control to event loop → allows other requests to process → prevents blocking → maintains responsiveness
Yields control every 1000 rows so other requests can be processed.
📊 Memory Efficiency
Constant memory usage:
- • Only 1 batch in memory at once
- • Previous batches are GC'd
- • 10M rows = same memory as 1K
- • Server RAM: ~50MB per request
🚀 Backpressure Handling
ReadableStream handles backpressure:
- • Pauses generation if client is slow
- • Prevents memory buildup
- • Automatic flow control
- • No manual buffering needed
Performance Metrics
| Rows | Time to First Byte | Total Time | Server Memory |
|---|---|---|---|
| 1,000 | ~50ms | ~200ms | ~30MB |
| 10,000 | ~50ms | ~1.5s | ~35MB |
| 100,000 | ~50ms | ~12s | ~40MB |
| 1,000,000 | ~50ms | ~2 min | ~50MB |
* Tested on: 10 columns, mixed data types, AWS Lambda (1GB RAM)
Error Handling in Streams
Streaming complicates error handling since HTTP headers are sent before you know if generation will succeed:
// Pseudo-code: Error handling in streams
create stream:
try:
generate and stream chunks
close stream on success
catch error:
// HTTP headers already sent!
// Can't change status code
// Send error marker in stream
enqueue error message to stream
signal error to client
Client detects error marker and displays itLessons Learned
1. Streaming is Not Always Faster
For small datasets (< 1000 rows), the overhead of streaming actually makes it slower than buffering. We automatically choose the best strategy based on row count.
2. Batch Size Matters
We tested batch sizes from 100 to 10,000. Sweet spot is 1000 rows (~100KB). Smaller batches increase overhead, larger ones risk memory pressure and unresponsiveness.
3. Client-Side Buffering Can Be Tricky
Browsers buffer streamed data in memory. For very large downloads (100MB+), we had to implement a download-as-you-go strategy where chunks are written to disk progressively using the File System Access API.
4. Progress Tracking Requires Planning
Since Content-Length is unknown with streaming, we send progress metadata in the stream itself (e.g., every 10,000 rows) so the client can show accurate progress bars.
Try It Yourself
Experience streaming data generation in action:
- 1.Create a schema with 10+ columns
- 2.Set row count to 100,000 or more
- 3.Click generate and watch the progress bar
- 4.Notice how data starts downloading immediately
Related Resources
PostgreSQL Mock Data Generator
Generate millions of rows of PostgreSQL mock data using server-side streaming. Full support for JSONB, UUID, arrays, and Postgres-specific types.
MySQL Dummy Data Generator
Stream millions of MySQL rows efficiently with proper datetime formatting, boolean handling, and MySQL-specific syntax. Constant memory usage.
MongoDB Mock Data Generator
Generate large MongoDB datasets with native ObjectId, ISODate wrappers, and MongoDB shell scripts. Streaming support for millions of documents.
How to Generate PostgreSQL Seed Data
Step-by-step guide to generating realistic PostgreSQL seed data at scale with JSONB, UUID, arrays, and foreign keys.
Frequently Asked Questions
- Why stream data instead of sending it all at once?
- Streaming provides immediate feedback to users, reduces memory usage on both client and server, and enables progressive rendering. Instead of waiting for 1 million rows to generate before sending anything, we start delivering data as soon as the first rows are ready.
- How does streaming work in Next.js App Router?
- Next.js App Router supports streaming through Route Handlers that return ReadableStream responses. The server generates data chunks and writes them to the stream, which are immediately sent to the client without buffering the entire response in memory.
- What's the difference between Web Streams and Server-Sent Events?
- Web Streams (ReadableStream) provide a lower-level, more flexible API for streaming binary or text data. SSE is higher-level and text-only, designed for real-time updates. For bulk data transfer like SQL generation, Web Streams are more efficient.
- Can you really generate 1 million rows without running out of memory?
- Yes! By using generator functions and streaming, we never hold all rows in memory at once. We generate rows in batches (typically 1000 at a time), stream them to the client, and then move on to the next batch. Memory usage stays constant regardless of total row count.
- How do you handle errors during streaming?
- Once streaming starts, HTTP headers are already sent, so you can't return a different status code. We handle errors by sending error markers in the stream itself, which the client can detect and display. We also implement retry logic and idempotent generation for reliability.