How We Stream 1 Million SQL Rows using Next.js and Web Streams

The Challenge: Generating Millions of Rows

When building MockBlast, we faced a fundamental problem: how do you let users generate millions of rows without:

❌Running out of memory: Buffering 1M rows on the server would consume gigabytes of RAM.
❌Making users wait: Generating all rows before sending any would create long delays.
❌Timeout errors: Long-running requests would hit API Gateway or browser timeouts.
❌Blocking the server: One large request shouldn't block other users.

The answer: streaming.

The Architecture: Web Streams + Generator Functions

1. Generator Function for Row Production

// Pseudo-code: Generator function for row production
async generator generateRows(schema, count):
  batchSize = 1000
  
  for each batch:
    generate batchSize rows
    format as SQL INSERT statement
    yield batch to stream
    yield control to event loop (non-blocking)
  
  continue until all rows generated

2. ReadableStream for HTTP Streaming

// Pseudo-code: Next.js Route Handler
POST /api/generate:
  create ReadableStream
  
  for each chunk from generator:
    encode chunk as text
    enqueue to stream
    stream sends to client immediately
  
  close stream when done
  
  return Response with stream
  headers: chunked transfer encoding

3. Client-Side Stream Consumption

// Pseudo-code: Client-side stream consumption
function downloadSQL(schema, rowCount):
  fetch streaming endpoint
  get stream reader
  
  while stream has data:
    read chunk
    decode chunk
    append to SQL string
    update progress bar
  
  when stream complete:
    download SQL file
  
  Benefits:
    - Immediate feedback
    - Progressive download
    - Memory efficient

Key Optimizations

🎯 Batching Strategy

Generate rows in batches of 1000:

• Reduces function call overhead
• Amortizes SQL formatting costs
• Balances memory vs. throughput
• 1000 rows = ~100KB chunks

⚡ Event Loop Yielding

Prevent blocking the event loop:

// Pseudo-code: Event loop yielding
yield control to event loop
  → allows other requests to process
  → prevents blocking
  → maintains responsiveness

Yields control every 1000 rows so other requests can be processed.

📊 Memory Efficiency

Constant memory usage:

• Only 1 batch in memory at once
• Previous batches are GC'd
• 10M rows = same memory as 1K
• Server RAM: ~50MB per request

🚀 Backpressure Handling

ReadableStream handles backpressure:

• Pauses generation if client is slow
• Prevents memory buildup
• Automatic flow control
• No manual buffering needed

Performance Metrics

Rows	Time to First Byte	Total Time	Server Memory
1,000	~50ms	~200ms	~30MB
10,000	~50ms	~1.5s	~35MB
100,000	~50ms	~12s	~40MB
1,000,000	~50ms	~2 min	~50MB

* Tested on: 10 columns, mixed data types, AWS Lambda (1GB RAM)

Error Handling in Streams

Streaming complicates error handling since HTTP headers are sent before you know if generation will succeed:

// Pseudo-code: Error handling in streams
create stream:
  try:
    generate and stream chunks
    close stream on success
  catch error:
    // HTTP headers already sent!
    // Can't change status code
    // Send error marker in stream
    enqueue error message to stream
    signal error to client
    
  Client detects error marker and displays it

Lessons Learned

1. Streaming is Not Always Faster

For small datasets (< 1000 rows), the overhead of streaming actually makes it slower than buffering. We automatically choose the best strategy based on row count.

2. Batch Size Matters

We tested batch sizes from 100 to 10,000. Sweet spot is 1000 rows (~100KB). Smaller batches increase overhead, larger ones risk memory pressure and unresponsiveness.

3. Client-Side Buffering Can Be Tricky

Browsers buffer streamed data in memory. For very large downloads (100MB+), we had to implement a download-as-you-go strategy where chunks are written to disk progressively using the File System Access API.

4. Progress Tracking Requires Planning

Since Content-Length is unknown with streaming, we send progress metadata in the stream itself (e.g., every 10,000 rows) so the client can show accurate progress bars.

Try It Yourself

Experience streaming data generation in action:

1.Create a schema with 10+ columns
2.Set row count to 100,000 or more
3.Click generate and watch the progress bar
4.Notice how data starts downloading immediately

Try Streaming Generation →