How We Stream 1 Million SQL Rows using Next.js and Web Streams

Technical deep dive into MockBlast's streaming architecture using Next.js App Router and the Web Streams API to generate and deliver millions of SQL rows efficiently.

The Challenge: Generating Millions of Rows

When building MockBlast, we faced a fundamental problem: how do you let users generate millions of rows without:

  • Running out of memory: Buffering 1M rows on the server would consume gigabytes of RAM.
  • Making users wait: Generating all rows before sending any would create long delays.
  • Timeout errors: Long-running requests would hit API Gateway or browser timeouts.
  • Blocking the server: One large request shouldn't block other users.

The answer: streaming.

The Architecture: Web Streams + Generator Functions

1. Generator Function for Row Production

// Pseudo-code: Generator function for row production
async generator generateRows(schema, count):
  batchSize = 1000
  
  for each batch:
    generate batchSize rows
    format as SQL INSERT statement
    yield batch to stream
    yield control to event loop (non-blocking)
  
  continue until all rows generated

2. ReadableStream for HTTP Streaming

// Pseudo-code: Next.js Route Handler
POST /api/generate:
  create ReadableStream
  
  for each chunk from generator:
    encode chunk as text
    enqueue to stream
    stream sends to client immediately
  
  close stream when done
  
  return Response with stream
  headers: chunked transfer encoding

3. Client-Side Stream Consumption

// Pseudo-code: Client-side stream consumption
function downloadSQL(schema, rowCount):
  fetch streaming endpoint
  get stream reader
  
  while stream has data:
    read chunk
    decode chunk
    append to SQL string
    update progress bar
  
  when stream complete:
    download SQL file
  
  Benefits:
    - Immediate feedback
    - Progressive download
    - Memory efficient

Key Optimizations

🎯 Batching Strategy

Generate rows in batches of 1000:

  • • Reduces function call overhead
  • • Amortizes SQL formatting costs
  • • Balances memory vs. throughput
  • • 1000 rows = ~100KB chunks

⚡ Event Loop Yielding

Prevent blocking the event loop:

// Pseudo-code: Event loop yielding
yield control to event loop
  → allows other requests to process
  → prevents blocking
  → maintains responsiveness

Yields control every 1000 rows so other requests can be processed.

📊 Memory Efficiency

Constant memory usage:

  • • Only 1 batch in memory at once
  • • Previous batches are GC'd
  • • 10M rows = same memory as 1K
  • • Server RAM: ~50MB per request

🚀 Backpressure Handling

ReadableStream handles backpressure:

  • • Pauses generation if client is slow
  • • Prevents memory buildup
  • • Automatic flow control
  • • No manual buffering needed

Performance Metrics

RowsTime to First ByteTotal TimeServer Memory
1,000~50ms~200ms~30MB
10,000~50ms~1.5s~35MB
100,000~50ms~12s~40MB
1,000,000~50ms~2 min~50MB

* Tested on: 10 columns, mixed data types, AWS Lambda (1GB RAM)

Error Handling in Streams

Streaming complicates error handling since HTTP headers are sent before you know if generation will succeed:

// Pseudo-code: Error handling in streams
create stream:
  try:
    generate and stream chunks
    close stream on success
  catch error:
    // HTTP headers already sent!
    // Can't change status code
    // Send error marker in stream
    enqueue error message to stream
    signal error to client
    
  Client detects error marker and displays it

Lessons Learned

1. Streaming is Not Always Faster

For small datasets (< 1000 rows), the overhead of streaming actually makes it slower than buffering. We automatically choose the best strategy based on row count.

2. Batch Size Matters

We tested batch sizes from 100 to 10,000. Sweet spot is 1000 rows (~100KB). Smaller batches increase overhead, larger ones risk memory pressure and unresponsiveness.

3. Client-Side Buffering Can Be Tricky

Browsers buffer streamed data in memory. For very large downloads (100MB+), we had to implement a download-as-you-go strategy where chunks are written to disk progressively using the File System Access API.

4. Progress Tracking Requires Planning

Since Content-Length is unknown with streaming, we send progress metadata in the stream itself (e.g., every 10,000 rows) so the client can show accurate progress bars.

Try It Yourself

Experience streaming data generation in action:

  1. 1.Create a schema with 10+ columns
  2. 2.Set row count to 100,000 or more
  3. 3.Click generate and watch the progress bar
  4. 4.Notice how data starts downloading immediately

Related Resources

Frequently Asked Questions

Why stream data instead of sending it all at once?
Streaming provides immediate feedback to users, reduces memory usage on both client and server, and enables progressive rendering. Instead of waiting for 1 million rows to generate before sending anything, we start delivering data as soon as the first rows are ready.
How does streaming work in Next.js App Router?
Next.js App Router supports streaming through Route Handlers that return ReadableStream responses. The server generates data chunks and writes them to the stream, which are immediately sent to the client without buffering the entire response in memory.
What's the difference between Web Streams and Server-Sent Events?
Web Streams (ReadableStream) provide a lower-level, more flexible API for streaming binary or text data. SSE is higher-level and text-only, designed for real-time updates. For bulk data transfer like SQL generation, Web Streams are more efficient.
Can you really generate 1 million rows without running out of memory?
Yes! By using generator functions and streaming, we never hold all rows in memory at once. We generate rows in batches (typically 1000 at a time), stream them to the client, and then move on to the next batch. Memory usage stays constant regardless of total row count.
How do you handle errors during streaming?
Once streaming starts, HTTP headers are already sent, so you can't return a different status code. We handle errors by sending error markers in the stream itself, which the client can detect and display. We also implement retry logic and idempotent generation for reliability.

Ready to Generate Mock Data?

Stop writing scripts manually. MockBlast generates production-ready seed data for Postgres, MySQL, MongoDB, and JSON in seconds.