RAG PDF Chatbot
Retrieval-Augmented Generation (RAG) chatbot for intelligent question-answering over technical PDF documents. Combines FAISS (CPU) vector search, intent-based retrieval (6 query types: figure, table, page, section, general, comparison), and Google Gemini (gemini-embedding-001 + gemini-2.5-flash) for accurate, source-cited answers with confidence scoring (High/Medium/Low from source quality, chunk count, semantic match, verbatim presence).
Role
AI Engineer & Full-stack Developer
Team
Solo
Company/Organization
YNM Safety
The Problem
Finding specific information in large technical PDF documents (hundreds of pages with figures, tables, specifications) required manual searching — no...
Traditional keyword search (Ctrl+F) only matched exact text, missing semantic meaning and context. Searching 'crash barrier specifications' wouldn't...
Difficult to locate specific figures, tables, or page references without knowing exact locations. Questions like 'What does Fig 3.3 show?' or...
No confidence scoring to help users assess answer reliability and accuracy. Users couldn't gauge if retrieved information was definitive (high...
Existing RAG solutions lacked intent-aware retrieval — couldn't distinguish between requests for specific figures/tables (exact match needed) vs...
Commercial document Q&A services (DocuSign Insight, Adobe Liquid Mode) were expensive ($500+/month) or required proprietary APIs with vendor lock-in.
No modern UI for technical PDF Q&A — needed dark/light theme for different reading environments, multi-chat support for organizing questions by topic,...
The Solution
Built a comprehensive RAG chatbot with 5-stage pipeline and security-first architecture.
RAG Pipeline (5 Stages)
Intent Classification — intent_classifier.py uses pattern matching and keyword detection to classify queries into 6 types:
- FIGURE_QUERY: 'What does Fig 3.3 show?' → Extract figure number, exact match in metadata
- TABLE_QUERY: 'Table 6.2 guidelines' → Extract table number, exact match
- PAGE_QUERY: 'What is on page 27?' → Extract page number, retrieve chunks from that page
- SECTION_QUERY: 'What is in Section 4?' → Extract section identifier, match section headers
- GENERAL_QUERY: 'Size of STOP sign?' → FAISS semantic search with k=5 nearest neighbors
- COMPARISON_QUERY: 'Compare Fig 3.1 and 3.2' → Extract both references, retrieve both, compare
Retrieval — Based on intent, use appropriate strategy:
- Exact match (figures/tables/pages): Query metadata.json for exact reference, return specific chunk
- FAISS semantic search (general/section): Embed query with gemini-embedding-001, query FAISS index (CPU-based approximate nearest neighbor), return...
- Vision captions: If page has images, load from vision_captions.json (pre-generated with Gemini Vision API on page images)
Context Building — Assemble retrieved chunks into context:
- Sort by relevance score (FAISS distance or exact match priority)
- Include chunk text, page numbers, figure/table identifiers, section headers
- Add surrounding chunks if /expand-context called (shows context before/after matched chunk)
- Limit total context to ~4000 tokens to fit Gemini context window
LLM Generation — Send context to Google Gemini gemini-2.5-flash:
- Structured prompt: "Based on the following PDF excerpts, answer the question. Provide structured JSON with 'answer' (paragraphs array or lists),...
- Temperature 0.3 for factual accuracy (low creativity, high consistency)
- Max tokens 1024 for detailed answers
Response — Parse Gemini JSON response, calculate confidence score, return to frontend:
- Multi-factor confidence: source quality (primary source = High, secondary = Medium), chunk count (1 chunk = Medium, 3+ = High), semantic match...
- Final confidence: average of factors mapped to High/Medium/Low
Backend (Python 3.10+ + FastAPI)
app.py — Main FastAPI application with 5 endpoints:
`GET /health` — Health check with environment variable verification (returns 'GEMINI_API_KEY: set' or 'missing', never exposes actual key)
`POST /ask` — Main RAG Q&A endpoint (accepts question, returns answer with confidence and sources)
`POST /classify-intent` — Intent classification only (returns intent type and extracted entities like figure numbers)
`POST /expand-context` — Show surrounding chunks for a given chunk ID (no additional LLM/FAISS calls, just metadata lookup)
`POST /generate-chat-title` — Generate short title from question (uses Gemini to create 3-5 word summary)
intent_classifier.py — Pattern-based intent classification with regex for figure numbers, table numbers, page numbers, section identifiers. Falls...
rebuild_index.py — Script to rebuild FAISS index from PDF:
Extract text chunks (paragraph-level, preserving structure)
Embed each chunk with gemini-embedding-001
Build FAISS index with IndexFlatL2 (exact search, CPU-friendly)
Save faiss.index, metadata.json (chunk text, page numbers, figure/table IDs), vision_captions.json (pre-generated page image captions)
Gitignored (faiss.index can be large, regenerated locally)
Frontend (React 19 + Vite 7)
App.jsx — Main component with chat interface:
Multi-chat support: create/switch/delete conversations, stored in localStorage
Message list with user questions and bot answers (structured paragraphs/lists, confidence badge, source citations)
Input field with submit button and loading indicator
Dark/light theme toggle (persists to localStorage)
api.js — API client for FastAPI backend:
Fetch wrapper with error handling
Endpoints for /ask, /classify-intent, /expand-context, /generate-chat-title
Environment variable VITE_API_URL for backend URL (localhost:8000 in dev, production URL in deploy)
PDF Export — jsPDF export button captures conversation:
Generates PDF with conversation title, timestamp, all Q&A pairs, confidence scores, sources
Downloads as `chat-export-{timestamp}.pdf`
Makefile Automation (15 commands)
`make install` — Install Python dependencies (requirements.txt) + Node dependencies (frontend/package.json), optionally in venv
`make setup-env` — Copy .env.example → .env for local configuration
`make check-env` — Verify GEMINI_API_KEY is set in environment
`make dev` — Run backend (uvicorn on port 8000) + frontend (Vite on port 5173) concurrently
`make dev-backend` — Backend only (uvicorn app:app --reload --port 8000)
`make dev-frontend` — Frontend only (cd frontend && npm run dev)
`make health` — Curl http://localhost:8000/health to verify backend responding
`make status` — Show running processes (backend/frontend)
`make build` — Production frontend build (frontend/dist/)
`make lint` — Lint frontend (ESLint)
`make lint-backend` — Lint Python (requires black/flake8, optional)
`make kill` — Kill dev server processes (backend/frontend)
`make rebuild-index` — Run rebuild_index.py to regenerate FAISS index from PDFs
`make clean` — Remove build artifacts (__pycache__, frontend/dist/)
`make clean-all` — Deep clean (remove node_modules, venv, faiss.index, metadata.json)
`make verify-deploy` — Run scripts/verify-deployment.sh for pre-deployment security check
GitHub Actions CI/CD (.github/workflows/ci.yml)
Runs on every push and pull request:
Lint Frontend — Install npm deps, run ESLint on frontend/src/
Build Frontend — Production build (npm run build), verify frontend/dist/ created
Lint Backend (optional) — Install black/flake8, lint Python files
Security Scan — Check for committed secrets (.env files, API keys) using grep/ack, fail if found
Comprehensive Documentation
SETUP.md — Detailed setup guide: prerequisites (Python 3.10+, Node 18+, Gemini API key), installation steps (clone, venv setup, install deps,...
DEPLOYMENT.md — Deployment guides for 5 platforms:
Vercel: Connect GitHub repo, configure build settings (root: ., build command: cd frontend && npm run build, output: frontend/dist), set...
GCP Cloud Run: Build Docker image, push to Container Registry, deploy with Cloud Run, configure secrets (GEMINI_API_KEY in Secret Manager)
Railway: Connect repo, configure start command (uvicorn app:app --host 0.0.0.0 --port $PORT), set environment variables
Render: Connect repo, configure build/start commands, set environment variables
Docker: Multi-stage Dockerfile (build frontend → copy to backend → serve with FastAPI), docker build/run commands
CONTRIBUTING.md — Contribution guidelines: fork/clone/branch workflow, code style (black for Python, ESLint for JS), commit message conventions,...
SECURITY.md — Security policy: responsible disclosure process, supported versions, known issues, security best practices (API keys in env only,...
CHANGELOG.md — Version history: v1.0.0 initial release, v1.1.0 added intent classification, v1.2.0 added confidence scoring, v2.0.0 React 19...
Security-First Approach
API key server-only: GEMINI_API_KEY loaded from environment on backend, never sent to frontend. Health endpoint returns 'set' or 'missing' status...
.env gitignored: .gitignore blocks .env, .env.local, .env.*, ensuring secrets never committed. .env.example template with placeholder values safe to...
Pre-commit hooks: .pre-commit-config.yaml runs secret detection (detect-private-key, check-added-large-files) before each commit.
Secret scanning in CI: GitHub Actions workflow fails build if .env files or API key patterns detected in committed files.
verify-deploy check: scripts/verify-deployment.sh runs automated security checks (no .env files, no API keys in code, faiss.index gitignored) before...
Deployment Options
Vercel (Easiest) — Full-stack serverless, automatic deployments on git push, environment variables via dashboard
GCP Cloud Run (Medium) — Scalable containerized deployment, Cloud Build integration, Secret Manager for API keys
Railway (Easy) — Git-push deploy, automatic HTTPS, environment variables via dashboard
Render (Easy) — Git-push deploy, free tier available, environment variables via dashboard
Docker (Medium) — Any container platform (AWS ECS, Azure Container Instances, DigitalOcean), multi-stage Dockerfile provided
Design Decisions
Chose FAISS over Pinecone/Weaviate for vector search — FAISS is CPU-friendly (no GPU required), free (no API costs), works offline, and sufficient for...
Implemented intent classification with 6 query types (figure, table, page, section, general, comparison) to route between exact match...
Used Google Gemini gemini-embedding-001 for embeddings and gemini-2.5-flash for generation — cost-effective (~$0.0001/1K tokens embedding, ~$0.002/1K...
Structured JSON output from LLM — prompt explicitly requests JSON format with 'answer' (paragraphs/lists), 'sources' (pages/figures/tables),...
Multi-factor confidence scoring — combines source quality (primary source document = High, secondary mentions = Medium), chunk count (single chunk =...
Context expansion endpoint (/expand-context) shows surrounding chunks without additional LLM or FAISS calls — just metadata lookup by chunk ID. Useful...
API key server-only with health endpoint — GEMINI_API_KEY loaded from environment on backend, never sent to frontend. /health returns 'GEMINI_API_KEY:...
React 19 + Vite 7 frontend — modern React with concurrent features, Vite provides instant HMR and fast production builds, simpler than Next.js for SPA...
Makefile for project automation — 15 commands (install, setup-env, dev, build, clean, rebuild-index, verify-deploy) simplify development workflow,...
GitHub Actions CI/CD with security scanning — lint/build checks catch errors before merge, secret scanning (grep/ack for .env files, API key patterns)...
Comprehensive documentation — SETUP.md (detailed setup), DEPLOYMENT.md (5 platforms), CONTRIBUTING.md (contribution guidelines), SECURITY.md (security...
Pre-commit hooks with .pre-commit-config.yaml — runs detect-private-key, check-added-large-files before each commit. Prevents secrets and large files...
Gitignored data files (faiss.index, metadata.json, vision_captions.json) — these are generated locally from PDFs via rebuild_index.py. Keeps repo size...
Dark/light theme with localStorage persistence — accommodates different reading preferences (dark mode for low-light environments, light for daylight),...
Multi-chat support with localStorage — enables organizing questions by topic (e.g., separate chats for different PDF documents or question categories),...
Tradeoffs & Constraints
FAISS IndexFlatL2 exact search — provides highest accuracy (no approximation errors) but doesn't scale to millions of vectors. For larger datasets...
faiss.index must be pre-built from PDFs via rebuild_index.py — not generated at runtime (would be too slow for large documents). Requires running...
Google Gemini API costs — embedding ~$0.0001/1K tokens, generation ~$0.002/1K tokens. Controlled via vision caption caching (pre-generate page image...
Single-document focus — optimized for querying one PDF at a time. Multi-document support would require index management (separate FAISS indexes per...
No real-time indexing — PDFs must be indexed offline via rebuild_index.py. Real-time indexing (index new PDFs on upload) would require background job...
Structured JSON responses from LLM — rely on Gemini following prompt instructions. Occasionally LLM returns malformed JSON or ignores structure....
React SPA frontend — great for interactive chat UI but misses SEO benefits of SSR (Next.js). Acceptable for internal tools but would need SSR for...
CPU-only FAISS — good for moderate-sized indexes (10K-100K vectors) but GPU-accelerated FAISS would be 10-100x faster for large-scale deployments....
No streaming responses — answers return all at once after LLM finishes. Streaming (token-by-token display) would improve perceived performance but...
Would improve: Add streaming responses for long answers, implement multi-document support with document selector UI, add real-time PDF indexing on...
Outcome & Impact
Production-ready RAG chatbot for technical PDF question-answering with intelligent retrieval and confidence scoring enabling users to quickly find...
Intent-aware retrieval with 6 query types routing to optimal strategy: FIGURE_QUERY ('What does Fig 3.3 show?') → exact figure reference matching,...
Multi-factor confidence scoring provides High/Medium/Low assessment from source quality (primary source = High, secondary mentions = Medium), chunk...
Structured JSON answers with paragraphs/lists format and source citations (page numbers, figure IDs, table IDs) enable clear, scannable responses...
5 FastAPI endpoints serving complete RAG workflow: GET /health (health check with GEMINI_API_KEY verification, never exposes actual key), POST /ask...
React 19 frontend with modern UX: dark/light theme toggle (persists to localStorage), multi-chat support (create/switch/delete conversations,...
Makefile automation with 15 commands simplifies development workflow: install (Python + Node deps), setup-env (copy .env.example → .env), check-env...
GitHub Actions CI/CD catches errors before merge: lint frontend (ESLint on frontend/src/), build frontend (production build, verify dist/ created),...
Comprehensive documentation enables self-service: SETUP.md (detailed setup: prerequisites, installation steps, FAISS index generation, local running,...
Security-first architecture protects sensitive credentials: API key server-only (GEMINI_API_KEY on backend only, never sent to frontend, health...
Flexible deployment options accommodate different hosting preferences: Vercel easiest (full-stack serverless, git-push deploy, environment variables...
FAISS vector search with Google Gemini embeddings provides semantic understanding — queries like 'crash barrier specifications' retrieve semantically...
Gitignored data files (faiss.index, metadata.json, vision_captions.json) keep repository size small — files generated locally via rebuild_index.py...
Pre-deployment checklist (`make verify-deploy`) automates security validation: checks no .env files committed, no API keys in source code,...
MIT license enables open research use — academic researchers, students, and developers can use, modify, and distribute the codebase for educational and...
Tech Stack
Backend: Python 3.10+, FastAPI (web framework), Uvicorn (ASGI server)
Vector Search: FAISS (CPU-based IndexFlatL2 exact nearest-neighbor), NumPy (numerical operations)
Embeddings / LLM: Google Gemini (gemini-embedding-001 for embeddings, gemini-2.5-flash for generation)
Vision: Pillow (page image processing), Gemini Vision API (image caption generation)
Frontend: React 19 (UI library with concurrent features), Vite 7 (build tool, dev server with instant HMR)
PDF Export: jsPDF (PDF generation from conversation data)
CI/CD: GitHub Actions (automated lint/build/security checks on push and PR)
Containerization: Docker (multi-stage Dockerfile for production deployment)
Automation: Makefile (15 commands for dev/build/deploy workflows)
Security: pre-commit hooks (detect-private-key, check-added-large-files), secret scanning in CI