Writing a SQL Parser in TypeScript: Regex Nightmares & Solutions

Deep dive into the challenges of parsing SQL CREATE TABLE statements in TypeScript. Learn from our mistakes and discover practical solutions.

The Challenge: Parsing CREATE TABLE

When we started building MockBlast, we needed to parse CREATE TABLE statements to extract:

  • Table name
  • Column names and data types
  • Constraints (PRIMARY KEY, UNIQUE, NOT NULL, DEFAULT)
  • Foreign key relationships

Here's what we tried and what actually worked.

Attempt #1: One Giant Regex (Don't Do This)

Our first attempt was to parse the entire CREATE TABLE statement with one regex:

// Pseudo-code: One giant regex (DON'T DO THIS)
regex = /CREATE TABLE ... (everything) ... /i

// Problems:
// - Catastrophic backtracking
// - Can't handle nested parentheses
// - Impossible to debug

Problems:

  • • Catastrophic backtracking on complex inputs (5+ seconds)
  • • Doesn't handle nested parentheses (e.g., DECIMAL(10,2))
  • • Impossible to debug when it fails
  • • Can't distinguish column definitions from constraints

Attempt #2: Multi-Stage Parsing (Better)

We broke parsing into stages:

// Pseudo-code: Multi-stage parsing approach
function parseCreateTable(sql):
  1. Preprocess: strip comments, normalize whitespace
  2. Extract table name
  3. Extract column definitions
  4. Parse each column individually
  5. Extract constraints
  6. Return structured schema object

Key Parsing Challenges & Solutions

Challenge 1: Splitting Columns (Commas in Types)

Can't split on commas because types contain commas:

price DECIMAL(10,2),  -- Comma inside type definition
tags TEXT[],           -- Comma after
status ENUM('active','inactive')  -- Commas everywhere!

Solution: State Machine

// Pseudo-code: State machine for splitting columns
function splitColumns(columnStr):
  track parenDepth = 0
  track inQuotes = false
  currentColumn = ""
  
  for each character:
    if quote: toggle inQuotes
    if '(' and not in quotes: parenDepth++
    if ')' and not in quotes: parenDepth--
    if ',' and parenDepth == 0 and not in quotes:
      save currentColumn
      reset currentColumn
    else:
      append to currentColumn
  
  return all columns

Challenge 2: Case-Insensitive Matching

SQL is case-insensitive, but regex /i flag isn't enough:

-- All valid
PRIMARY KEY
primary key
Primary Key
PrImArY kEy

Solution: Normalize Before Parsing

// Pseudo-code: Preprocessing step
function preprocess(sql):
  1. Normalize SQL keywords to uppercase
     (but preserve string literals!)
  2. Normalize whitespace (multiple spaces → single space)
  3. Remove SQL comments (-- and /* */)
  4. Trim leading/trailing whitespace
  5. Return cleaned SQL

Challenge 3: Parsing Data Types with Length

Data types can be complex:

VARCHAR(255)
DECIMAL(10,2)
BIT(8)
INT UNSIGNED
TIMESTAMP WITH TIME ZONE

Solution: Type-Specific Parsing

// Pseudo-code: Parse data types
function parseDataType(typeStr):
  extract base type name
  extract length/precision if present (e.g., VARCHAR(255))
  extract modifiers (UNSIGNED, SIGNED, etc.)
  
  return {
    baseType: normalized type name,
    length: extracted length,
    precision: extracted precision,
    modifiers: array of modifiers
  }

Challenge 4: Default Values with Expressions

DEFAULT can be literals or expressions:

created_at TIMESTAMP DEFAULT NOW()
active BOOLEAN DEFAULT true
name VARCHAR(100) DEFAULT 'Unknown'
counter INT DEFAULT (0)

Solution: Extract Until Next Keyword

// Pseudo-code: Extract default values
function extractDefault(columnDef):
  find DEFAULT keyword
  extract value until next keyword or end
  
  if value is function call (ends with '()'):
    return as expression
  else if value is quoted string:
    return unquoted string
  else:
    return as literal (number/boolean)

The Final Parser Architecture

// Pseudo-code: Final parser architecture
function parseCreateTable(sql):
  1. Preprocess SQL (clean, normalize)
  2. Validate it's CREATE TABLE statement
  3. Extract table name
  4. Extract column definitions block
  5. Split into individual definitions
  6. Separate columns from constraints
  7. Parse each column/constraint
  8. Infer relationships from foreign keys
  9. Return structured schema object

Testing Strategy

We built a test suite with real-world CREATE TABLE statements:

  • 100+ test cases: From simple to complex schemas
  • Database-specific tests: PostgreSQL, MySQL, SQLite quirks
  • Edge cases: Weird formatting, comments, multi-line
  • Fuzzing: Random SQL generation to find crashes
// Pseudo-code: Testing approach
test suite:
  - Simple table parsing
  - Database-specific types (SERIAL, UUID, etc.)
  - Complex constraints
  - Edge cases (comments, formatting)
  - Real-world SQL from open-source projects
  
  Each test:
    1. Provide SQL input
    2. Parse with our parser
    3. Verify expected structure
    4. Check error handling

Lessons Learned

1. Don't Fight Regex, Use It Strategically

Regex is great for specific patterns (e.g., extracting table names). Don't try to parse the entire syntax tree with one regex. Use it as one tool in your toolbox, not the only tool.

2. Preprocess Aggressively

Normalize input before parsing. Strip comments, normalize whitespace, uppercase keywords. Makes your parser simpler and more reliable.

3. Test with Real-World SQL

Don't write test cases from scratch. Collect CREATE TABLE statements from popular open-source projects (WordPress, Discourse, Mastodon). Real SQL is messier than you think.

4. Be Pragmatic About Coverage

You can't support every SQL feature. Focus on the 90% case and provide good error messages for unsupported syntax. Users appreciate "this isn't supported yet" over silent failures.

5. Performance Matters in Browsers

Our parser runs in browsers, so performance is critical. We can parse most CREATE TABLE statements in < 5ms. Avoid exponential regex backtracking and use linear state machines where possible.

Try MockBlast's Parser

Experience instant SQL parsing in your browser:

  1. 1.Paste any CREATE TABLE statement
  2. 2.See instant validation and schema preview
  3. 3.Your SQL never leaves your browser
  4. 4.Works offline, works fast

Related Resources

Frequently Asked Questions

Why not use an existing SQL parser library?
Most SQL parser libraries are huge (5MB+), support features we don't need (SELECT, JOIN, etc.), and don't work well in browsers. For MockBlast's use case (parsing CREATE TABLE only), a lightweight custom parser made more sense.
Can you really parse SQL with regex?
Regex alone can't parse SQL completely (it's not a regular language). But for the 90% case of well-formatted CREATE TABLE statements, carefully crafted regex with some post-processing works surprisingly well. For edge cases, we fall back to manual parsing.
What SQL dialects do you support?
MockBlast's parser handles PostgreSQL, MySQL, and SQLite CREATE TABLE syntax. We support the most common features: data types, constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL), and DEFAULT values. Exotic features are best-effort.
How do you handle SQL comments and whitespace?
We preprocess the SQL to strip comments (-- and /* */) and normalize whitespace before parsing. This simplifies regex patterns and reduces edge cases. SQL formatting doesn't matter—minified or prettified both work.
What's the biggest challenge in parsing SQL?
Ambiguity and inconsistency across dialects. The same syntax can mean different things in different databases. For example, INT(11) in MySQL is a display hint, but in some databases it's a length constraint. Context matters.

Ready to Generate Mock Data?

Stop writing scripts manually. MockBlast generates production-ready seed data for Postgres, MySQL, MongoDB, and JSON in seconds.