Text-to-SQL AI

Full-stack text-to-SQL application converting natural language into any SQL operation and executing in real-time—SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE, PRAGMA, and EXPLAIN. Python 3.13+ FastAPI backend with live schema introspection (get_live_schema() queries sqlite_master, newly created tables immediately available to LLM), OpenAI GPT-4o-mini integration with schema-aware prompts, execute_sql() supporting all SQL operations.

Python 3.13+FastAPIUvicornOpenAI GPT-4o-miniSQLite3React 19ViteJavaScriptGitHub ActionsMakefile

View Code

Role

AI Engineer & Full-stack Developer

Team

Solo

Company/Organization

Personal Project

The Problem

•

Existing text-to-SQL solutions only supported SELECT queries (read-only), limiting users to data retrieval without ability to insert records, update...

•

Database management tasks required SQL expertise — users needed to manually write INSERT statements for new records, UPDATE for modifications, DELETE...

•

No live schema introspection meant newly created tables weren't available to the LLM context — if user created a 'notes' table via natural language,...

•

DML/DDL operations had no intelligent result formatting — INSERT/UPDATE/DELETE returned raw database responses without showing rows affected count....

•

No unified interface for full database lifecycle management — needed separate tools for querying (SELECT), data entry (INSERT), updates (UPDATE),...

•

Natural language variations for database operations weren't handled — 'add a customer' vs 'insert customer' vs 'create new customer record' all meant...

•

Existing text-to-SQL demos focused on analytics (SELECT with JOINs, aggregations) but ignored practical database management tasks that non-technical...

The Solution

•

Built comprehensive text-to-SQL application supporting full SQL operation spectrum with live schema introspection and intelligent result formatting.

Backend (Python 3.13 + FastAPI + SQLite3)

•

main.py — FastAPI app with CORS middleware and 3 REST endpoints:

•

`POST /execute-query` — Accepts natural language question, calls LLM service to generate SQL (any operation type:...

•

`GET /schema` — Returns live database schema via get_live_schema() (queries sqlite_master for current table list, columns, types, foreign...

•

`GET /health` — Health check returning API status and OPENAI_API_KEY presence (never exposes actual key)

•

database.py — SQLite3 operations with full SQL support:

•

init_database() — Creates 4 tables (customers, products, orders, order_items) with foreign key constraints, seeds 84 rows of sample data (15...

•

execute_sql(sql: str) — Universal SQL execution function:

•

SELECT: Returns rows as list of dicts with column names

•

INSERT/UPDATE/DELETE/REPLACE: Returns rows affected count (cursor.rowcount)

•

CREATE/ALTER/DROP: Returns execution status ('Table created successfully', 'Table altered successfully', 'Table dropped successfully')

•

PRAGMA: Returns metadata results (table_info, foreign_key_list, index_list)

•

EXPLAIN: Returns query plan steps

•

get_live_schema() — Queries sqlite_master table for current database state:

•

`SELECT name, sql FROM sqlite_master WHERE type='table'` gets all table definitions

•

For each table, runs `PRAGMA table_info(table_name)` for column metadata

•

Runs `PRAGMA foreign_key_list(table_name)` for relationship info

•

Returns JSON with tables, columns, types, foreign keys — includes tables created during session

•

llm.py — OpenAI GPT-4o-mini integration:

•

generate_sql(question: str, schema: dict) — Constructs system prompt with live database schema (all tables including newly created ones, column...

•

System prompt template: "You are a SQL expert. Given this database schema: [live schema JSON with all current tables], convert the following natural...

•

Temperature 0.3 for balanced SQL generation (deterministic for SELECT/INSERT/UPDATE, slightly creative for CREATE TABLE schema design)

•

Supports all SQL operation types — no restrictions on query type

•

models.py — Pydantic models for request/response validation:

•

`QueryRequest` (question: str) for POST /execute-query

•

`QueryResponse` (sql: str, results: List[dict] | int | str, operation_type: str, execution_time: float, error: Optional[str])

•

results type varies: List[dict] for SELECT, int for DML rows affected, str for DDL status

•

•

`SchemaResponse` (tables: dict) for GET /schema with live table info

Frontend (React 19 + Vite)

•

App.jsx — Main UI component with modern responsive design:

•

Input field for natural language questions with submit button

•

12 pre-built suggestion buttons covering all operation types:

•

SELECT: "Show all customers from USA", "List all products in Electronics category"

•

INSERT: "Add a customer named Jane from London", "Insert a product named Wireless Mouse at $25"

•

UPDATE: "Update all pending orders to processing", "Change price of Laptop to $1200"

•

DELETE: "Delete all cancelled orders", "Remove products with zero stock"

•

REPLACE: "Upsert a product named Webcam HD at $79.99" (INSERT or UPDATE if exists)

•

CREATE TABLE: "Create a notes table with id, content, created_at", "Make a tags table with id and name"

•

ALTER TABLE: "Add a discount_percent column to products", "Add email column to customers"

•

DROP TABLE: "Drop the notes table", "Delete the tags table"

•

PRAGMA: "Show column info for the orders table", "List foreign keys in order_items"

•

EXPLAIN: "Show the query plan for selecting delivered orders", "Explain how to find top customers"

•

Query-type badges with color coding:

•

Blue: SELECT (read operations)

•

Green: INSERT (create new records)

•

Yellow: UPDATE (modify existing records)

•

Red: DELETE (remove records)

•

Purple: CREATE/ALTER/DROP (schema operations)

•

Teal: PRAGMA/EXPLAIN (metadata and analysis)

•

Intelligent result display:

•

SELECT: Data table with columns and rows (responsive, scrollable, sortable)

•

INSERT/UPDATE/DELETE/REPLACE: Rows affected badge ("3 rows affected", "1 row inserted")

•

CREATE/ALTER/DROP: Success status message ("Table 'notes' created successfully")

•

PRAGMA: Metadata table (column info, foreign keys, indexes)

•

EXPLAIN: Query plan visualization (step-by-step execution strategy)

•

Generated SQL display with syntax highlighting (color-coded keywords, tables, columns)

•

Execution time badge (milliseconds)

•

Error handling with user-friendly messages

•

api.js — Axios wrapper for backend communication:

•

`executeQuery(question)` — POST /execute-query with natural language question

•

`getSchema()` — GET /schema for live schema info (includes newly created tables)

•

Base URL from environment variable (VITE_API_URL, defaults to http://localhost:8000)

•

App.css — Modern responsive styling with dark theme, gradient backgrounds, glassmorphism effect on cards, color-coded badges, smooth transitions,...

Live Schema Introspection

•

Key innovation — newly created tables are immediately available to LLM:

•

User enters: "Create a notes table with id, content, created_at"

•

LLM generates: `CREATE TABLE notes (id INTEGER PRIMARY KEY, content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)`

•

execute_sql() runs CREATE TABLE, updates database

•

Next query: "Add a note saying 'Hello World'"

•

get_live_schema() queries sqlite_master, finds 'notes' table in current database state

•

LLM receives updated schema including 'notes' table in system prompt

•

LLM generates: `INSERT INTO notes (content) VALUES ('Hello World')`

•

Works immediately without application restart

Database Schema (SQLite3)

•

4 tables with foreign key relationships (auto-created on startup):

•

customers (15 rows) — id, name, email, country, created_at

•

products (15 rows) — id, name, category, price, stock, created_at

•

orders (20 rows) — id, customer_id (FK → customers), order_date, status (pending/shipped/delivered), total_amount

•

order_items (34 rows) — id, order_id (FK → orders), product_id (FK → products), quantity, price

•

Relationship flow: customers → orders → order_items ← products (many-to-many via order_items junction table)

•

Users can create additional tables via natural language — all immediately available to LLM.

Supported SQL Operations (10 types)

•

| Type | Keywords | Example Prompt | Generated SQL | |------|----------|----------------|---------------| | SELECT | `SELECT`, `WITH` | "Show all...

Makefile Automation (6 commands)

•

`make install` — Install Python dependencies (requirements.txt) + Node dependencies (frontend/package.json)

•

`make dev` — Run backend (uvicorn on port 8000) + frontend (Vite on port 5173) concurrently

•

`make backend` — Backend only (uvicorn main:app --reload --port 8000)

•

`make frontend` — Frontend only (cd frontend && npm run dev)

•

`make stop` — Kill dev server processes (backend/frontend)

•

`make clean` — Remove build artifacts (__pycache__, frontend/dist/, database.db, node_modules, .ruff_cache)

GitHub Actions CI/CD (.github/workflows/ci.yml)

•

Runs on every push and PR:

•

Security Check — Scans for `.env`, `.db`, API keys in tracked files (fails if found)

•

Backend Tests — Python syntax compilation, Black formatting check, mypy type checking, Bandit security scan

•

Frontend Tests — ESLint code quality, Vite production build verification

•

Integration Check — Required files present (.env.example safety check, README exists, Makefile commands valid)

Deployment Architecture

•

Backend → Railway / Render / GCP Cloud Run:

•

Set environment variable `OPENAI_API_KEY` via platform dashboard

•

Deploy from GitHub repo (auto-deploy on push to main)

•

Start command: `uvicorn main:app --host 0.0.0.0 --port $PORT`

•

Health check endpoint: `/health` for container readiness

•

Frontend → Vercel / Netlify:

•

Set root directory to `frontend`

•

Set environment variable `VITE_API_URL` to backend URL

•

Build command: `npm run build`

•

Output directory: `dist`

•

Deploy from GitHub repo (auto-deploy on push to main)

Security Configuration

•

API key server-only: `OPENAI_API_KEY` loaded from environment on backend, never sent to frontend. /health returns 'OPENAI_API_KEY: set' or 'missing'...

•

.env gitignored: `.gitignore` blocks `.env`, `.env.local`, `.env.*`, `*.db`, `*.sqlite` ensuring secrets and database never committed. `.env.example`...

•

Database files gitignored: `database.db`, `*.sqlite` blocked to prevent committing production data

•

No SQL injection needed — LLM generates trusted SQL, application controls generation context (schema-aware prompts), regex validation optional (can...

•

CORS configuration: Backend explicitly allows frontend origin (localhost:5173 in dev, Vercel URL in prod)

Cost Analysis

•

GPT-4o-mini pricing (~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens):

•

Average query: 500 input tokens (schema + question) + 50 output tokens (SQL)

•

Cost per query: ~$0.00008 (0.008 cents)

•

10,000 queries: ~$0.80

•

100,000 queries: ~$8.00

•

Extremely cost-efficient for production use.

Design Decisions

•

Chose GPT-4o-mini over GPT-3.5-turbo for SQL generation — same cost tier but better instruction following for complex CREATE TABLE statements, ALTER...

•

Implemented full SQL support (all 10 operation types) instead of read-only SELECT — enables complete database lifecycle management via natural...

•

Built live schema introspection via get_live_schema() querying sqlite_master — newly created tables immediately available to LLM without application...

•

Designed intelligent result formatting based on operation type — SELECT returns data table, INSERT/UPDATE/DELETE show rows affected count,...

•

Added query-type badges with color coding (blue SELECT, green INSERT, yellow UPDATE, red DELETE, purple CREATE/ALTER/DROP, teal...

•

Used universal execute_sql() function handling all operation types instead of separate functions per type — simpler architecture, easier to maintain,...

•

Schema-aware prompts with live table definitions, column types, foreign keys sent to OpenAI — ensures generated SQL references valid tables/columns...

•

React 19 + Vite over Next.js — simpler for single-page interactive UI, instant HMR during development, no SSR needed for database management interface....

•

12 example queries covering all operation types — onboards users with demonstrations of full capabilities (not just SELECT). Shows practical use cases:...

•

Temperature 0.3 for balanced SQL generation — deterministic enough for SELECT/INSERT/UPDATE consistency (same question → same query), creative enough...

•

Makefile for workflow automation — 6 commands (install, dev, backend, frontend, stop, clean) provide consistent setup. Reduced from 10 to 6 by removing...

•

GitHub Actions CI/CD with comprehensive checks — security scan (no .env/.db files), backend tests (syntax/formatting/types/security), frontend tests...

•

Pydantic models with operation_type field — frontend knows whether to display data table (SELECT), rows affected (DML), or status message (DDL)....

•

No SQL injection protection needed — LLM generates trusted SQL from controlled context (schema-aware prompts), application doesn't accept arbitrary SQL...

Tradeoffs & Constraints

•

SQLite3 single-connection writes — sufficient for moderate-scale use (<100 concurrent users) but wouldn't scale to high-concurrency write workloads...

•

OpenAI API latency — typical 1-2s response time for SQL generation (GPT-4o-mini) adds to end-to-end query latency. Caching common questions (Redis with...

•

No query result pagination — returns all rows from SELECT queries. For large result sets (10K+ rows), would need LIMIT/OFFSET pagination or...

•

Sample database with 84 rows — demonstrates functionality but doesn't represent production data volumes. For real use, would need production-scale...

•

No authentication or user management — suitable for internal tools or demos but requires OAuth/JWT for multi-user deployment. Would need user_id in...

•

No query history or saved queries — each query is ephemeral. Adding history (localStorage for client-side, PostgreSQL for server-side) would improve UX...

•

No transaction support for multi-statement operations — each query executes independently. For operations requiring atomicity (create table + insert...

•

Live schema introspection queries sqlite_master on every request — adds ~5-10ms overhead. For databases with hundreds of tables, would need schema...

•

No query optimization or EXPLAIN integration — LLM generates functional SQL but may not be optimal (missing indexes, inefficient JOINs). Would benefit...

•

Would improve: Add result pagination (LIMIT/OFFSET UI controls), implement query caching (Redis) for cost reduction, add user authentication...

Outcome & Impact

•

Production-ready text-to-SQL system supporting full SQL operation spectrum in real-time — SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE,...

•

Live schema introspection via get_live_schema() querying sqlite_master ensures newly created tables immediately available to LLM — user creates 'notes'...

•

10 supported SQL operation types with intelligent result formatting: SELECT returns data table with columns and rows, INSERT/UPDATE/DELETE/REPLACE...

•

12 built-in example queries demonstrating full capabilities across all operation types — onboards users with practical demonstrations: 'Show all...

•

Modern responsive UI with query-type badges color-coded by operation — blue SELECT (reads), green INSERT (creates), yellow UPDATE (modifies), red...

•

FastAPI backend with 3 endpoints serving complete workflow: POST /execute-query (natural language → SQL generation via GPT-4o-mini → execution via...

•

React 19 + Vite frontend with instant query execution — typical <2s latency including OpenAI API call (1-2s generation + <50ms database execution for...

•

Multi-table SQLite database with foreign key relationships auto-created on startup — customers (15 rows), products (15 rows), orders (20 rows),...

•

Universal execute_sql() function handles all SQL operation types — SELECT returns List[dict] rows, INSERT/UPDATE/DELETE return int rows affected...

•

Makefile automation with 6 commands simplifies development workflow — make install (Python + Node deps), make dev (concurrent backend + frontend), make...

•

GitHub Actions CI/CD with comprehensive validation — Security Check (scans for .env, .db, API keys in tracked files), Backend Tests (Python syntax,...

•

Comprehensive README documentation with quick start (4 commands: clone, configure .env, make install, make dev), features (full SQL support, live...

•

Secure .env configuration isolates secrets — OPENAI_API_KEY on backend only (never exposed to frontend, health endpoint returns 'set'/'missing'), .env...

•

Deployment architecture with decoupled services — backend on Railway/Render/GCP Cloud Run (environment variable OPENAI_API_KEY, start command uvicorn...

•

Cost-efficient with GPT-4o-mini — ~$0.00008 per query (500 input tokens schema + question, 50 output tokens SQL), 10K queries = $0.80, 100K queries =...

•

Open source for learning and experimentation — comprehensive codebase demonstrates full-stack AI application (FastAPI backend, React frontend, OpenAI...

Tech Stack

•

Backend: Python 3.13+, FastAPI (web framework), Uvicorn (ASGI server)

•

Database: SQLite3 (file-based, 4 tables with foreign keys, 84 rows sample data, live schema introspection via sqlite_master)

•

LLM: OpenAI GPT-4o-mini (natural language → SQL conversion, schema-aware prompts, temperature 0.3)

•

Frontend: React 19 (UI library with concurrent features), Vite (build tool + dev server with instant HMR)

•

HTTP Client: Axios (frontend API calls to backend endpoints)

•

Validation: Pydantic (request/response models, type safety, automatic OpenAPI docs)

•

CI/CD: GitHub Actions (security scanning, backend tests, frontend tests, integration checks)

•

Automation: Makefile (6 commands for install/dev/backend/frontend/stop/clean workflows)

•

Deployment: Railway or Render or GCP Cloud Run (backend), Vercel or Netlify (frontend), environment variable configuration

Back to Projects