Text-to-SQL AI
Full-stack text-to-SQL application converting natural language into any SQL operation and executing in real-time—SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE, PRAGMA, and EXPLAIN. Python 3.13+ FastAPI backend with live schema introspection (get_live_schema() queries sqlite_master, newly created tables immediately available to LLM), OpenAI GPT-4o-mini integration with schema-aware prompts, execute_sql() supporting all SQL operations.
Role
AI Engineer & Full-stack Developer
Team
Solo
Company/Organization
Personal Project
The Problem
Existing text-to-SQL solutions only supported SELECT queries (read-only), limiting users to data retrieval without ability to insert records, update...
Database management tasks required SQL expertise — users needed to manually write INSERT statements for new records, UPDATE for modifications, DELETE...
No live schema introspection meant newly created tables weren't available to the LLM context — if user created a 'notes' table via natural language,...
DML/DDL operations had no intelligent result formatting — INSERT/UPDATE/DELETE returned raw database responses without showing rows affected count....
No unified interface for full database lifecycle management — needed separate tools for querying (SELECT), data entry (INSERT), updates (UPDATE),...
Natural language variations for database operations weren't handled — 'add a customer' vs 'insert customer' vs 'create new customer record' all meant...
Existing text-to-SQL demos focused on analytics (SELECT with JOINs, aggregations) but ignored practical database management tasks that non-technical...
The Solution
Built comprehensive text-to-SQL application supporting full SQL operation spectrum with live schema introspection and intelligent result formatting.
Backend (Python 3.13 + FastAPI + SQLite3)
main.py — FastAPI app with CORS middleware and 3 REST endpoints:
`POST /execute-query` — Accepts natural language question, calls LLM service to generate SQL (any operation type:...
`GET /schema` — Returns live database schema via get_live_schema() (queries sqlite_master for current table list, columns, types, foreign...
`GET /health` — Health check returning API status and OPENAI_API_KEY presence (never exposes actual key)
database.py — SQLite3 operations with full SQL support:
init_database() — Creates 4 tables (customers, products, orders, order_items) with foreign key constraints, seeds 84 rows of sample data (15...
execute_sql(sql: str) — Universal SQL execution function:
SELECT: Returns rows as list of dicts with column names
INSERT/UPDATE/DELETE/REPLACE: Returns rows affected count (cursor.rowcount)
CREATE/ALTER/DROP: Returns execution status ('Table created successfully', 'Table altered successfully', 'Table dropped successfully')
PRAGMA: Returns metadata results (table_info, foreign_key_list, index_list)
EXPLAIN: Returns query plan steps
get_live_schema() — Queries sqlite_master table for current database state:
`SELECT name, sql FROM sqlite_master WHERE type='table'` gets all table definitions
For each table, runs `PRAGMA table_info(table_name)` for column metadata
Runs `PRAGMA foreign_key_list(table_name)` for relationship info
Returns JSON with tables, columns, types, foreign keys — includes tables created during session
llm.py — OpenAI GPT-4o-mini integration:
generate_sql(question: str, schema: dict) — Constructs system prompt with live database schema (all tables including newly created ones, column...
System prompt template: "You are a SQL expert. Given this database schema: [live schema JSON with all current tables], convert the following natural...
Temperature 0.3 for balanced SQL generation (deterministic for SELECT/INSERT/UPDATE, slightly creative for CREATE TABLE schema design)
Supports all SQL operation types — no restrictions on query type
models.py — Pydantic models for request/response validation:
`QueryRequest` (question: str) for POST /execute-query
`QueryResponse` (sql: str, results: List[dict] | int | str, operation_type: str, execution_time: float, error: Optional[str])
results type varies: List[dict] for SELECT, int for DML rows affected, str for DDL status
operation_type: 'SELECT' | 'INSERT' | 'UPDATE' | 'DELETE' | 'CREATE' | 'ALTER' | 'DROP' | 'REPLACE' | 'PRAGMA' | 'EXPLAIN'
`SchemaResponse` (tables: dict) for GET /schema with live table info
Frontend (React 19 + Vite)
App.jsx — Main UI component with modern responsive design:
Input field for natural language questions with submit button
12 pre-built suggestion buttons covering all operation types:
SELECT: "Show all customers from USA", "List all products in Electronics category"
INSERT: "Add a customer named Jane from London", "Insert a product named Wireless Mouse at $25"
UPDATE: "Update all pending orders to processing", "Change price of Laptop to $1200"
DELETE: "Delete all cancelled orders", "Remove products with zero stock"
REPLACE: "Upsert a product named Webcam HD at $79.99" (INSERT or UPDATE if exists)
CREATE TABLE: "Create a notes table with id, content, created_at", "Make a tags table with id and name"
ALTER TABLE: "Add a discount_percent column to products", "Add email column to customers"
DROP TABLE: "Drop the notes table", "Delete the tags table"
PRAGMA: "Show column info for the orders table", "List foreign keys in order_items"
EXPLAIN: "Show the query plan for selecting delivered orders", "Explain how to find top customers"
Query-type badges with color coding:
Blue: SELECT (read operations)
Green: INSERT (create new records)
Yellow: UPDATE (modify existing records)
Red: DELETE (remove records)
Purple: CREATE/ALTER/DROP (schema operations)
Teal: PRAGMA/EXPLAIN (metadata and analysis)
Intelligent result display:
SELECT: Data table with columns and rows (responsive, scrollable, sortable)
INSERT/UPDATE/DELETE/REPLACE: Rows affected badge ("3 rows affected", "1 row inserted")
CREATE/ALTER/DROP: Success status message ("Table 'notes' created successfully")
PRAGMA: Metadata table (column info, foreign keys, indexes)
EXPLAIN: Query plan visualization (step-by-step execution strategy)
Generated SQL display with syntax highlighting (color-coded keywords, tables, columns)
Execution time badge (milliseconds)
Error handling with user-friendly messages
api.js — Axios wrapper for backend communication:
`executeQuery(question)` — POST /execute-query with natural language question
`getSchema()` — GET /schema for live schema info (includes newly created tables)
Base URL from environment variable (VITE_API_URL, defaults to http://localhost:8000)
App.css — Modern responsive styling with dark theme, gradient backgrounds, glassmorphism effect on cards, color-coded badges, smooth transitions,...
Live Schema Introspection
Key innovation — newly created tables are immediately available to LLM:
User enters: "Create a notes table with id, content, created_at"
LLM generates: `CREATE TABLE notes (id INTEGER PRIMARY KEY, content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)`
execute_sql() runs CREATE TABLE, updates database
Next query: "Add a note saying 'Hello World'"
get_live_schema() queries sqlite_master, finds 'notes' table in current database state
LLM receives updated schema including 'notes' table in system prompt
LLM generates: `INSERT INTO notes (content) VALUES ('Hello World')`
Works immediately without application restart
Database Schema (SQLite3)
4 tables with foreign key relationships (auto-created on startup):
customers (15 rows) — id, name, email, country, created_at
products (15 rows) — id, name, category, price, stock, created_at
orders (20 rows) — id, customer_id (FK → customers), order_date, status (pending/shipped/delivered), total_amount
order_items (34 rows) — id, order_id (FK → orders), product_id (FK → products), quantity, price
Relationship flow: customers → orders → order_items ← products (many-to-many via order_items junction table)
Users can create additional tables via natural language — all immediately available to LLM.
Supported SQL Operations (10 types)
| Type | Keywords | Example Prompt | Generated SQL | |------|----------|----------------|---------------| | SELECT | `SELECT`, `WITH` | "Show all...
Makefile Automation (6 commands)
`make install` — Install Python dependencies (requirements.txt) + Node dependencies (frontend/package.json)
`make dev` — Run backend (uvicorn on port 8000) + frontend (Vite on port 5173) concurrently
`make backend` — Backend only (uvicorn main:app --reload --port 8000)
`make frontend` — Frontend only (cd frontend && npm run dev)
`make stop` — Kill dev server processes (backend/frontend)
`make clean` — Remove build artifacts (__pycache__, frontend/dist/, database.db, node_modules, .ruff_cache)
GitHub Actions CI/CD (.github/workflows/ci.yml)
Runs on every push and PR:
Security Check — Scans for `.env`, `.db`, API keys in tracked files (fails if found)
Backend Tests — Python syntax compilation, Black formatting check, mypy type checking, Bandit security scan
Frontend Tests — ESLint code quality, Vite production build verification
Integration Check — Required files present (.env.example safety check, README exists, Makefile commands valid)
Deployment Architecture
Backend → Railway / Render / GCP Cloud Run:
Set environment variable `OPENAI_API_KEY` via platform dashboard
Deploy from GitHub repo (auto-deploy on push to main)
Start command: `uvicorn main:app --host 0.0.0.0 --port $PORT`
Health check endpoint: `/health` for container readiness
Frontend → Vercel / Netlify:
Set root directory to `frontend`
Set environment variable `VITE_API_URL` to backend URL
Build command: `npm run build`
Output directory: `dist`
Deploy from GitHub repo (auto-deploy on push to main)
Security Configuration
API key server-only: `OPENAI_API_KEY` loaded from environment on backend, never sent to frontend. /health returns 'OPENAI_API_KEY: set' or 'missing'...
.env gitignored: `.gitignore` blocks `.env`, `.env.local`, `.env.*`, `*.db`, `*.sqlite` ensuring secrets and database never committed. `.env.example`...
Database files gitignored: `database.db`, `*.sqlite` blocked to prevent committing production data
No SQL injection needed — LLM generates trusted SQL, application controls generation context (schema-aware prompts), regex validation optional (can...
CORS configuration: Backend explicitly allows frontend origin (localhost:5173 in dev, Vercel URL in prod)
Cost Analysis
GPT-4o-mini pricing (~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens):
Average query: 500 input tokens (schema + question) + 50 output tokens (SQL)
Cost per query: ~$0.00008 (0.008 cents)
10,000 queries: ~$0.80
100,000 queries: ~$8.00
Extremely cost-efficient for production use.
Design Decisions
Chose GPT-4o-mini over GPT-3.5-turbo for SQL generation — same cost tier but better instruction following for complex CREATE TABLE statements, ALTER...
Implemented full SQL support (all 10 operation types) instead of read-only SELECT — enables complete database lifecycle management via natural...
Built live schema introspection via get_live_schema() querying sqlite_master — newly created tables immediately available to LLM without application...
Designed intelligent result formatting based on operation type — SELECT returns data table, INSERT/UPDATE/DELETE show rows affected count,...
Added query-type badges with color coding (blue SELECT, green INSERT, yellow UPDATE, red DELETE, purple CREATE/ALTER/DROP, teal...
Used universal execute_sql() function handling all operation types instead of separate functions per type — simpler architecture, easier to maintain,...
Schema-aware prompts with live table definitions, column types, foreign keys sent to OpenAI — ensures generated SQL references valid tables/columns...
React 19 + Vite over Next.js — simpler for single-page interactive UI, instant HMR during development, no SSR needed for database management interface....
12 example queries covering all operation types — onboards users with demonstrations of full capabilities (not just SELECT). Shows practical use cases:...
Temperature 0.3 for balanced SQL generation — deterministic enough for SELECT/INSERT/UPDATE consistency (same question → same query), creative enough...
Makefile for workflow automation — 6 commands (install, dev, backend, frontend, stop, clean) provide consistent setup. Reduced from 10 to 6 by removing...
GitHub Actions CI/CD with comprehensive checks — security scan (no .env/.db files), backend tests (syntax/formatting/types/security), frontend tests...
Pydantic models with operation_type field — frontend knows whether to display data table (SELECT), rows affected (DML), or status message (DDL)....
No SQL injection protection needed — LLM generates trusted SQL from controlled context (schema-aware prompts), application doesn't accept arbitrary SQL...
Tradeoffs & Constraints
SQLite3 single-connection writes — sufficient for moderate-scale use (<100 concurrent users) but wouldn't scale to high-concurrency write workloads...
OpenAI API latency — typical 1-2s response time for SQL generation (GPT-4o-mini) adds to end-to-end query latency. Caching common questions (Redis with...
No query result pagination — returns all rows from SELECT queries. For large result sets (10K+ rows), would need LIMIT/OFFSET pagination or...
Sample database with 84 rows — demonstrates functionality but doesn't represent production data volumes. For real use, would need production-scale...
No authentication or user management — suitable for internal tools or demos but requires OAuth/JWT for multi-user deployment. Would need user_id in...
No query history or saved queries — each query is ephemeral. Adding history (localStorage for client-side, PostgreSQL for server-side) would improve UX...
No transaction support for multi-statement operations — each query executes independently. For operations requiring atomicity (create table + insert...
Live schema introspection queries sqlite_master on every request — adds ~5-10ms overhead. For databases with hundreds of tables, would need schema...
No query optimization or EXPLAIN integration — LLM generates functional SQL but may not be optimal (missing indexes, inefficient JOINs). Would benefit...
Would improve: Add result pagination (LIMIT/OFFSET UI controls), implement query caching (Redis) for cost reduction, add user authentication...
Outcome & Impact
Production-ready text-to-SQL system supporting full SQL operation spectrum in real-time — SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE,...
Live schema introspection via get_live_schema() querying sqlite_master ensures newly created tables immediately available to LLM — user creates 'notes'...
10 supported SQL operation types with intelligent result formatting: SELECT returns data table with columns and rows, INSERT/UPDATE/DELETE/REPLACE...
12 built-in example queries demonstrating full capabilities across all operation types — onboards users with practical demonstrations: 'Show all...
Modern responsive UI with query-type badges color-coded by operation — blue SELECT (reads), green INSERT (creates), yellow UPDATE (modifies), red...
FastAPI backend with 3 endpoints serving complete workflow: POST /execute-query (natural language → SQL generation via GPT-4o-mini → execution via...
React 19 + Vite frontend with instant query execution — typical <2s latency including OpenAI API call (1-2s generation + <50ms database execution for...
Multi-table SQLite database with foreign key relationships auto-created on startup — customers (15 rows), products (15 rows), orders (20 rows),...
Universal execute_sql() function handles all SQL operation types — SELECT returns List[dict] rows, INSERT/UPDATE/DELETE return int rows affected...
Makefile automation with 6 commands simplifies development workflow — make install (Python + Node deps), make dev (concurrent backend + frontend), make...
GitHub Actions CI/CD with comprehensive validation — Security Check (scans for .env, .db, API keys in tracked files), Backend Tests (Python syntax,...
Comprehensive README documentation with quick start (4 commands: clone, configure .env, make install, make dev), features (full SQL support, live...
Secure .env configuration isolates secrets — OPENAI_API_KEY on backend only (never exposed to frontend, health endpoint returns 'set'/'missing'), .env...
Deployment architecture with decoupled services — backend on Railway/Render/GCP Cloud Run (environment variable OPENAI_API_KEY, start command uvicorn...
Cost-efficient with GPT-4o-mini — ~$0.00008 per query (500 input tokens schema + question, 50 output tokens SQL), 10K queries = $0.80, 100K queries =...
Open source for learning and experimentation — comprehensive codebase demonstrates full-stack AI application (FastAPI backend, React frontend, OpenAI...
Tech Stack
Backend: Python 3.13+, FastAPI (web framework), Uvicorn (ASGI server)
Database: SQLite3 (file-based, 4 tables with foreign keys, 84 rows sample data, live schema introspection via sqlite_master)
LLM: OpenAI GPT-4o-mini (natural language → SQL conversion, schema-aware prompts, temperature 0.3)
Frontend: React 19 (UI library with concurrent features), Vite (build tool + dev server with instant HMR)
HTTP Client: Axios (frontend API calls to backend endpoints)
Validation: Pydantic (request/response models, type safety, automatic OpenAPI docs)
CI/CD: GitHub Actions (security scanning, backend tests, frontend tests, integration checks)
Automation: Makefile (6 commands for install/dev/backend/frontend/stop/clean workflows)
Deployment: Railway or Render or GCP Cloud Run (backend), Vercel or Netlify (frontend), environment variable configuration