Back to Projects

Text-to-SQL AI

Full-stack text-to-SQL application converting natural language into any SQL operation and executing in real-time—SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE, PRAGMA, and EXPLAIN. Python 3.13+ FastAPI backend with live schema introspection (get_live_schema() queries sqlite_master, newly created tables immediately available to LLM), OpenAI GPT-4o-mini integration with schema-aware prompts, execute_sql() supporting all SQL operations.

Python 3.13+FastAPIUvicornOpenAI GPT-4o-miniSQLite3React 19ViteJavaScriptGitHub ActionsMakefile

Role

AI Engineer & Full-stack Developer

Team

Solo

Company/Organization

Personal Project

The Problem

Existing text-to-SQL solutions only supported SELECT queries (read-only), limiting users to data retrieval without ability to insert records, update...

Database management tasks required SQL expertiseusers needed to manually write INSERT statements for new records, UPDATE for modifications, DELETE...

No live schema introspection meant newly created tables weren't available to the LLM contextif user created a 'notes' table via natural language,...

DML/DDL operations had no intelligent result formattingINSERT/UPDATE/DELETE returned raw database responses without showing rows affected count....

No unified interface for full database lifecycle managementneeded separate tools for querying (SELECT), data entry (INSERT), updates (UPDATE),...

Natural language variations for database operations weren't handled'add a customer' vs 'insert customer' vs 'create new customer record' all meant...

Existing text-to-SQL demos focused on analytics (SELECT with JOINs, aggregations) but ignored practical database management tasks that non-technical...

The Solution

Built comprehensive text-to-SQL application supporting full SQL operation spectrum with live schema introspection and intelligent result formatting.

Backend (Python 3.13 + FastAPI + SQLite3)

main.py — FastAPI app with CORS middleware and 3 REST endpoints:

`POST /execute-query`Accepts natural language question, calls LLM service to generate SQL (any operation type:...

`GET /schema`Returns live database schema via get_live_schema() (queries sqlite_master for current table list, columns, types, foreign...

`GET /health`Health check returning API status and OPENAI_API_KEY presence (never exposes actual key)

database.py — SQLite3 operations with full SQL support:

init_database() — Creates 4 tables (customers, products, orders, order_items) with foreign key constraints, seeds 84 rows of sample data (15...

execute_sql(sql: str) — Universal SQL execution function:

SELECT: Returns rows as list of dicts with column names

INSERT/UPDATE/DELETE/REPLACE: Returns rows affected count (cursor.rowcount)

CREATE/ALTER/DROP: Returns execution status ('Table created successfully', 'Table altered successfully', 'Table dropped successfully')

PRAGMA: Returns metadata results (table_info, foreign_key_list, index_list)

EXPLAIN: Returns query plan steps

get_live_schema() — Queries sqlite_master table for current database state:

`SELECT name, sql FROM sqlite_master WHERE type='table'` gets all table definitions

For each table, runs `PRAGMA table_info(table_name)` for column metadata

Runs `PRAGMA foreign_key_list(table_name)` for relationship info

Returns JSON with tables, columns, types, foreign keysincludes tables created during session

llm.py — OpenAI GPT-4o-mini integration:

generate_sql(question: str, schema: dict) — Constructs system prompt with live database schema (all tables including newly created ones, column...

System prompt template: "You are a SQL expert. Given this database schema: [live schema JSON with all current tables], convert the following natural...

Temperature 0.3 for balanced SQL generation (deterministic for SELECT/INSERT/UPDATE, slightly creative for CREATE TABLE schema design)

Supports all SQL operation typesno restrictions on query type

models.py — Pydantic models for request/response validation:

`QueryRequest` (question: str) for POST /execute-query

`QueryResponse` (sql: str, results: List[dict] | int | str, operation_type: str, execution_time: float, error: Optional[str])

results type varies: List[dict] for SELECT, int for DML rows affected, str for DDL status

operation_type: 'SELECT' | 'INSERT' | 'UPDATE' | 'DELETE' | 'CREATE' | 'ALTER' | 'DROP' | 'REPLACE' | 'PRAGMA' | 'EXPLAIN'

`SchemaResponse` (tables: dict) for GET /schema with live table info

Frontend (React 19 + Vite)

App.jsx — Main UI component with modern responsive design:

Input field for natural language questions with submit button

12 pre-built suggestion buttons covering all operation types:

SELECT: "Show all customers from USA", "List all products in Electronics category"

INSERT: "Add a customer named Jane from London", "Insert a product named Wireless Mouse at $25"

UPDATE: "Update all pending orders to processing", "Change price of Laptop to $1200"

DELETE: "Delete all cancelled orders", "Remove products with zero stock"

REPLACE: "Upsert a product named Webcam HD at $79.99" (INSERT or UPDATE if exists)

CREATE TABLE: "Create a notes table with id, content, created_at", "Make a tags table with id and name"

ALTER TABLE: "Add a discount_percent column to products", "Add email column to customers"

DROP TABLE: "Drop the notes table", "Delete the tags table"

PRAGMA: "Show column info for the orders table", "List foreign keys in order_items"

EXPLAIN: "Show the query plan for selecting delivered orders", "Explain how to find top customers"

Query-type badges with color coding:

Blue: SELECT (read operations)

Green: INSERT (create new records)

Yellow: UPDATE (modify existing records)

Red: DELETE (remove records)

Purple: CREATE/ALTER/DROP (schema operations)

Teal: PRAGMA/EXPLAIN (metadata and analysis)

Intelligent result display:

SELECT: Data table with columns and rows (responsive, scrollable, sortable)

INSERT/UPDATE/DELETE/REPLACE: Rows affected badge ("3 rows affected", "1 row inserted")

CREATE/ALTER/DROP: Success status message ("Table 'notes' created successfully")

PRAGMA: Metadata table (column info, foreign keys, indexes)

EXPLAIN: Query plan visualization (step-by-step execution strategy)

Generated SQL display with syntax highlighting (color-coded keywords, tables, columns)

Execution time badge (milliseconds)

Error handling with user-friendly messages

api.js — Axios wrapper for backend communication:

`executeQuery(question)`POST /execute-query with natural language question

`getSchema()`GET /schema for live schema info (includes newly created tables)

Base URL from environment variable (VITE_API_URL, defaults to http://localhost:8000)

App.css — Modern responsive styling with dark theme, gradient backgrounds, glassmorphism effect on cards, color-coded badges, smooth transitions,...

Live Schema Introspection

Key innovationnewly created tables are immediately available to LLM:

User enters: "Create a notes table with id, content, created_at"

LLM generates: `CREATE TABLE notes (id INTEGER PRIMARY KEY, content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP)`

execute_sql() runs CREATE TABLE, updates database

Next query: "Add a note saying 'Hello World'"

get_live_schema() queries sqlite_master, finds 'notes' table in current database state

LLM receives updated schema including 'notes' table in system prompt

LLM generates: `INSERT INTO notes (content) VALUES ('Hello World')`

Works immediately without application restart

Database Schema (SQLite3)

4 tables with foreign key relationships (auto-created on startup):

customers (15 rows) — id, name, email, country, created_at

products (15 rows) — id, name, category, price, stock, created_at

orders (20 rows) — id, customer_id (FK → customers), order_date, status (pending/shipped/delivered), total_amount

order_items (34 rows) — id, order_id (FK → orders), product_id (FK → products), quantity, price

Relationship flow: customers → orders → order_items ← products (many-to-many via order_items junction table)

Users can create additional tables via natural languageall immediately available to LLM.

Supported SQL Operations (10 types)

| Type | Keywords | Example Prompt | Generated SQL | |------|----------|----------------|---------------| | SELECT | `SELECT`, `WITH` | "Show all...

Makefile Automation (6 commands)

`make install`Install Python dependencies (requirements.txt) + Node dependencies (frontend/package.json)

`make dev`Run backend (uvicorn on port 8000) + frontend (Vite on port 5173) concurrently

`make backend`Backend only (uvicorn main:app --reload --port 8000)

`make frontend`Frontend only (cd frontend && npm run dev)

`make stop`Kill dev server processes (backend/frontend)

`make clean`Remove build artifacts (__pycache__, frontend/dist/, database.db, node_modules, .ruff_cache)

GitHub Actions CI/CD (.github/workflows/ci.yml)

Runs on every push and PR:

Security Check — Scans for `.env`, `.db`, API keys in tracked files (fails if found)

Backend Tests — Python syntax compilation, Black formatting check, mypy type checking, Bandit security scan

Frontend Tests — ESLint code quality, Vite production build verification

Integration Check — Required files present (.env.example safety check, README exists, Makefile commands valid)

Deployment Architecture

Backend → Railway / Render / GCP Cloud Run:

Set environment variable `OPENAI_API_KEY` via platform dashboard

Deploy from GitHub repo (auto-deploy on push to main)

Start command: `uvicorn main:app --host 0.0.0.0 --port $PORT`

Health check endpoint: `/health` for container readiness

Frontend → Vercel / Netlify:

Set root directory to `frontend`

Set environment variable `VITE_API_URL` to backend URL

Build command: `npm run build`

Output directory: `dist`

Deploy from GitHub repo (auto-deploy on push to main)

Security Configuration

API key server-only: `OPENAI_API_KEY` loaded from environment on backend, never sent to frontend. /health returns 'OPENAI_API_KEY: set' or 'missing'...

.env gitignored: `.gitignore` blocks `.env`, `.env.local`, `.env.*`, `*.db`, `*.sqlite` ensuring secrets and database never committed. `.env.example`...

Database files gitignored: `database.db`, `*.sqlite` blocked to prevent committing production data

No SQL injection neededLLM generates trusted SQL, application controls generation context (schema-aware prompts), regex validation optional (can...

CORS configuration: Backend explicitly allows frontend origin (localhost:5173 in dev, Vercel URL in prod)

Cost Analysis

GPT-4o-mini pricing (~$0.15 per 1M input tokens, ~$0.60 per 1M output tokens):

Average query: 500 input tokens (schema + question) + 50 output tokens (SQL)

Cost per query: ~$0.00008 (0.008 cents)

10,000 queries: ~$0.80

100,000 queries: ~$8.00

Extremely cost-efficient for production use.

Design Decisions

Chose GPT-4o-mini over GPT-3.5-turbo for SQL generationsame cost tier but better instruction following for complex CREATE TABLE statements, ALTER...

Implemented full SQL support (all 10 operation types) instead of read-only SELECTenables complete database lifecycle management via natural...

Built live schema introspection via get_live_schema() querying sqlite_masternewly created tables immediately available to LLM without application...

Designed intelligent result formatting based on operation typeSELECT returns data table, INSERT/UPDATE/DELETE show rows affected count,...

Added query-type badges with color coding (blue SELECT, green INSERT, yellow UPDATE, red DELETE, purple CREATE/ALTER/DROP, teal...

Used universal execute_sql() function handling all operation types instead of separate functions per typesimpler architecture, easier to maintain,...

Schema-aware prompts with live table definitions, column types, foreign keys sent to OpenAIensures generated SQL references valid tables/columns...

React 19 + Vite over Next.jssimpler for single-page interactive UI, instant HMR during development, no SSR needed for database management interface....

12 example queries covering all operation typesonboards users with demonstrations of full capabilities (not just SELECT). Shows practical use cases:...

Temperature 0.3 for balanced SQL generationdeterministic enough for SELECT/INSERT/UPDATE consistency (same question → same query), creative enough...

Makefile for workflow automation6 commands (install, dev, backend, frontend, stop, clean) provide consistent setup. Reduced from 10 to 6 by removing...

GitHub Actions CI/CD with comprehensive checkssecurity scan (no .env/.db files), backend tests (syntax/formatting/types/security), frontend tests...

Pydantic models with operation_type fieldfrontend knows whether to display data table (SELECT), rows affected (DML), or status message (DDL)....

No SQL injection protection neededLLM generates trusted SQL from controlled context (schema-aware prompts), application doesn't accept arbitrary SQL...

Tradeoffs & Constraints

SQLite3 single-connection writessufficient for moderate-scale use (<100 concurrent users) but wouldn't scale to high-concurrency write workloads...

OpenAI API latencytypical 1-2s response time for SQL generation (GPT-4o-mini) adds to end-to-end query latency. Caching common questions (Redis with...

No query result paginationreturns all rows from SELECT queries. For large result sets (10K+ rows), would need LIMIT/OFFSET pagination or...

Sample database with 84 rowsdemonstrates functionality but doesn't represent production data volumes. For real use, would need production-scale...

No authentication or user managementsuitable for internal tools or demos but requires OAuth/JWT for multi-user deployment. Would need user_id in...

No query history or saved querieseach query is ephemeral. Adding history (localStorage for client-side, PostgreSQL for server-side) would improve UX...

No transaction support for multi-statement operationseach query executes independently. For operations requiring atomicity (create table + insert...

Live schema introspection queries sqlite_master on every requestadds ~5-10ms overhead. For databases with hundreds of tables, would need schema...

No query optimization or EXPLAIN integrationLLM generates functional SQL but may not be optimal (missing indexes, inefficient JOINs). Would benefit...

Would improve: Add result pagination (LIMIT/OFFSET UI controls), implement query caching (Redis) for cost reduction, add user authentication...

Outcome & Impact

Production-ready text-to-SQL system supporting full SQL operation spectrum in real-timeSELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, REPLACE,...

Live schema introspection via get_live_schema() querying sqlite_master ensures newly created tables immediately available to LLMuser creates 'notes'...

10 supported SQL operation types with intelligent result formatting: SELECT returns data table with columns and rows, INSERT/UPDATE/DELETE/REPLACE...

12 built-in example queries demonstrating full capabilities across all operation typesonboards users with practical demonstrations: 'Show all...

Modern responsive UI with query-type badges color-coded by operationblue SELECT (reads), green INSERT (creates), yellow UPDATE (modifies), red...

FastAPI backend with 3 endpoints serving complete workflow: POST /execute-query (natural language → SQL generation via GPT-4o-mini → execution via...

React 19 + Vite frontend with instant query executiontypical <2s latency including OpenAI API call (1-2s generation + <50ms database execution for...

Multi-table SQLite database with foreign key relationships auto-created on startupcustomers (15 rows), products (15 rows), orders (20 rows),...

Universal execute_sql() function handles all SQL operation typesSELECT returns List[dict] rows, INSERT/UPDATE/DELETE return int rows affected...

Makefile automation with 6 commands simplifies development workflowmake install (Python + Node deps), make dev (concurrent backend + frontend), make...

GitHub Actions CI/CD with comprehensive validationSecurity Check (scans for .env, .db, API keys in tracked files), Backend Tests (Python syntax,...

Comprehensive README documentation with quick start (4 commands: clone, configure .env, make install, make dev), features (full SQL support, live...

Secure .env configuration isolates secretsOPENAI_API_KEY on backend only (never exposed to frontend, health endpoint returns 'set'/'missing'), .env...

Deployment architecture with decoupled servicesbackend on Railway/Render/GCP Cloud Run (environment variable OPENAI_API_KEY, start command uvicorn...

Cost-efficient with GPT-4o-mini~$0.00008 per query (500 input tokens schema + question, 50 output tokens SQL), 10K queries = $0.80, 100K queries =...

Open source for learning and experimentationcomprehensive codebase demonstrates full-stack AI application (FastAPI backend, React frontend, OpenAI...

Tech Stack

Backend: Python 3.13+, FastAPI (web framework), Uvicorn (ASGI server)

Database: SQLite3 (file-based, 4 tables with foreign keys, 84 rows sample data, live schema introspection via sqlite_master)

LLM: OpenAI GPT-4o-mini (natural language → SQL conversion, schema-aware prompts, temperature 0.3)

Frontend: React 19 (UI library with concurrent features), Vite (build tool + dev server with instant HMR)

HTTP Client: Axios (frontend API calls to backend endpoints)

Validation: Pydantic (request/response models, type safety, automatic OpenAPI docs)

CI/CD: GitHub Actions (security scanning, backend tests, frontend tests, integration checks)

Automation: Makefile (6 commands for install/dev/backend/frontend/stop/clean workflows)

Deployment: Railway or Render or GCP Cloud Run (backend), Vercel or Netlify (frontend), environment variable configuration

Back to Projects