Writing a SQL Parser in TypeScript: Regex Nightmares & Solutions
Deep dive into the challenges of parsing SQL CREATE TABLE statements in TypeScript. Learn from our mistakes and discover practical solutions.
The Challenge: Parsing CREATE TABLE
When we started building MockBlast, we needed to parse CREATE TABLE statements to extract:
- •Table name
- •Column names and data types
- •Constraints (PRIMARY KEY, UNIQUE, NOT NULL, DEFAULT)
- •Foreign key relationships
Here's what we tried and what actually worked.
Attempt #1: One Giant Regex (Don't Do This)
Our first attempt was to parse the entire CREATE TABLE statement with one regex:
// Pseudo-code: One giant regex (DON'T DO THIS) regex = /CREATE TABLE ... (everything) ... /i // Problems: // - Catastrophic backtracking // - Can't handle nested parentheses // - Impossible to debug
Problems:
- • Catastrophic backtracking on complex inputs (5+ seconds)
- • Doesn't handle nested parentheses (e.g., DECIMAL(10,2))
- • Impossible to debug when it fails
- • Can't distinguish column definitions from constraints
Attempt #2: Multi-Stage Parsing (Better)
We broke parsing into stages:
// Pseudo-code: Multi-stage parsing approach function parseCreateTable(sql): 1. Preprocess: strip comments, normalize whitespace 2. Extract table name 3. Extract column definitions 4. Parse each column individually 5. Extract constraints 6. Return structured schema object
Key Parsing Challenges & Solutions
Challenge 1: Splitting Columns (Commas in Types)
Can't split on commas because types contain commas:
price DECIMAL(10,2), -- Comma inside type definition
tags TEXT[], -- Comma after
status ENUM('active','inactive') -- Commas everywhere!Solution: State Machine
// Pseudo-code: State machine for splitting columns
function splitColumns(columnStr):
track parenDepth = 0
track inQuotes = false
currentColumn = ""
for each character:
if quote: toggle inQuotes
if '(' and not in quotes: parenDepth++
if ')' and not in quotes: parenDepth--
if ',' and parenDepth == 0 and not in quotes:
save currentColumn
reset currentColumn
else:
append to currentColumn
return all columnsChallenge 2: Case-Insensitive Matching
SQL is case-insensitive, but regex /i flag isn't enough:
-- All valid PRIMARY KEY primary key Primary Key PrImArY kEy
Solution: Normalize Before Parsing
// Pseudo-code: Preprocessing step
function preprocess(sql):
1. Normalize SQL keywords to uppercase
(but preserve string literals!)
2. Normalize whitespace (multiple spaces → single space)
3. Remove SQL comments (-- and /* */)
4. Trim leading/trailing whitespace
5. Return cleaned SQLChallenge 3: Parsing Data Types with Length
Data types can be complex:
VARCHAR(255) DECIMAL(10,2) BIT(8) INT UNSIGNED TIMESTAMP WITH TIME ZONE
Solution: Type-Specific Parsing
// Pseudo-code: Parse data types
function parseDataType(typeStr):
extract base type name
extract length/precision if present (e.g., VARCHAR(255))
extract modifiers (UNSIGNED, SIGNED, etc.)
return {
baseType: normalized type name,
length: extracted length,
precision: extracted precision,
modifiers: array of modifiers
}Challenge 4: Default Values with Expressions
DEFAULT can be literals or expressions:
created_at TIMESTAMP DEFAULT NOW() active BOOLEAN DEFAULT true name VARCHAR(100) DEFAULT 'Unknown' counter INT DEFAULT (0)
Solution: Extract Until Next Keyword
// Pseudo-code: Extract default values
function extractDefault(columnDef):
find DEFAULT keyword
extract value until next keyword or end
if value is function call (ends with '()'):
return as expression
else if value is quoted string:
return unquoted string
else:
return as literal (number/boolean)The Final Parser Architecture
// Pseudo-code: Final parser architecture function parseCreateTable(sql): 1. Preprocess SQL (clean, normalize) 2. Validate it's CREATE TABLE statement 3. Extract table name 4. Extract column definitions block 5. Split into individual definitions 6. Separate columns from constraints 7. Parse each column/constraint 8. Infer relationships from foreign keys 9. Return structured schema object
Testing Strategy
We built a test suite with real-world CREATE TABLE statements:
- •100+ test cases: From simple to complex schemas
- •Database-specific tests: PostgreSQL, MySQL, SQLite quirks
- •Edge cases: Weird formatting, comments, multi-line
- •Fuzzing: Random SQL generation to find crashes
// Pseudo-code: Testing approach
test suite:
- Simple table parsing
- Database-specific types (SERIAL, UUID, etc.)
- Complex constraints
- Edge cases (comments, formatting)
- Real-world SQL from open-source projects
Each test:
1. Provide SQL input
2. Parse with our parser
3. Verify expected structure
4. Check error handlingLessons Learned
1. Don't Fight Regex, Use It Strategically
Regex is great for specific patterns (e.g., extracting table names). Don't try to parse the entire syntax tree with one regex. Use it as one tool in your toolbox, not the only tool.
2. Preprocess Aggressively
Normalize input before parsing. Strip comments, normalize whitespace, uppercase keywords. Makes your parser simpler and more reliable.
3. Test with Real-World SQL
Don't write test cases from scratch. Collect CREATE TABLE statements from popular open-source projects (WordPress, Discourse, Mastodon). Real SQL is messier than you think.
4. Be Pragmatic About Coverage
You can't support every SQL feature. Focus on the 90% case and provide good error messages for unsupported syntax. Users appreciate "this isn't supported yet" over silent failures.
5. Performance Matters in Browsers
Our parser runs in browsers, so performance is critical. We can parse most CREATE TABLE statements in < 5ms. Avoid exponential regex backtracking and use linear state machines where possible.
Try MockBlast's Parser
Experience instant SQL parsing in your browser:
- 1.Paste any CREATE TABLE statement
- 2.See instant validation and schema preview
- 3.Your SQL never leaves your browser
- 4.Works offline, works fast
Related Resources
PostgreSQL Mock Data Generator
Try MockBlast's SQL parser in action. Import your PostgreSQL CREATE TABLE statements and generate mock data instantly. All parsing happens locally in your browser.
MySQL Dummy Data Generator
Import MySQL CREATE TABLE statements and see how MockBlast's parser handles MySQL-specific syntax, AUTO_INCREMENT, and TINYINT types.
Foreign Keys & Referential Integrity
Learn how MockBlast's parser extracts foreign key relationships from CREATE TABLE statements and maintains referential integrity.
Test Data with Foreign Keys
Complete guide to creating test data with foreign key relationships. See how MockBlast's parser handles complex constraint definitions.
Frequently Asked Questions
- Why not use an existing SQL parser library?
- Most SQL parser libraries are huge (5MB+), support features we don't need (SELECT, JOIN, etc.), and don't work well in browsers. For MockBlast's use case (parsing CREATE TABLE only), a lightweight custom parser made more sense.
- Can you really parse SQL with regex?
- Regex alone can't parse SQL completely (it's not a regular language). But for the 90% case of well-formatted CREATE TABLE statements, carefully crafted regex with some post-processing works surprisingly well. For edge cases, we fall back to manual parsing.
- What SQL dialects do you support?
- MockBlast's parser handles PostgreSQL, MySQL, and SQLite CREATE TABLE syntax. We support the most common features: data types, constraints (PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL), and DEFAULT values. Exotic features are best-effort.
- How do you handle SQL comments and whitespace?
- We preprocess the SQL to strip comments (-- and /* */) and normalize whitespace before parsing. This simplifies regex patterns and reduces edge cases. SQL formatting doesn't matter—minified or prettified both work.
- What's the biggest challenge in parsing SQL?
- Ambiguity and inconsistency across dialects. The same syntax can mean different things in different databases. For example, INT(11) in MySQL is a display hint, but in some databases it's a length constraint. Context matters.