AI Document Parser
Production-ready AI microservice accepting PDF, TXT, and EML files — extracts text, caches by SHA-256 hash in Redis 7 (1-hour TTL, AOF persistence), and returns structured GPT-4o-mini analysis (summary, document type, key points, important entities). FastAPI backend with Gunicorn + UvicornWorker, layered service architecture (document_service.py + llm_service.py), Langfuse LLM observability (prompt/response/tokens/cost per call), Loguru structured JSON logs.
Role
AI Engineer & Full-stack Developer
Team
Solo
Company/Organization
Personal Project
The Problem
Extracting structured insights from PDF, TXT, and EML documents required manual effort or separate tools with no unified REST API — no single endpoint...
Repeated analysis of identical documents (e.g., the same invoice re-uploaded) wasted LLM tokens and added unnecessary latency with no deduplication...
Zero visibility into LLM cost, token usage, or trace data per request made it impossible to debug expensive or slow AI calls in production.
No structured logging made correlating request metadata (filename, text length, cache status) with application errors difficult.
Deploying the service alongside a Redis cache and a React frontend required manual coordination — no containerised full-stack setup for consistent...
The Solution
Built a FastAPI microservice with a clean layered architecture separating extraction, caching, and LLM concerns.
Backend Architecture
app/services/document_service.py — Text extraction + cache orchestration:
PDF: pypdf PdfReader, concatenates all page text
EML: stdlib email.message_from_bytes, walks MIME parts, extracts text/plain
TXT: UTF-8 decode with error fallback
Computes SHA-256 hash of extracted text, checks Redis async GET, returns cached result or continues to LLM
Unsupported MIME type raises HTTP 415; corrupt/unreadable file raises HTTP 422
app/services/llm_service.py — OpenAI call + Langfuse trace:
Sends extracted text to GPT-4o-mini with structured prompt requesting JSON: summary, document_type, key_points[], important_entities[]
Wraps call in Langfuse generation (start/end) recording model, prompt, response, usage tokens, and cost
Parses and validates JSON response; raises HTTP 502 on OpenAI failure or malformed JSON
app/core/redis_client.py — Async Redis wrapper:
GET returns cached DocumentResponse or None
SET serialises to JSON with 3600s TTL
Redis 7 with AOF persistence ensures cache survives container restarts
app/core/langfuse_client.py — Langfuse singleton initialised from env vars (LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_BASE_URL); no-ops...
app/core/logger.py — Loguru sink configured for structured JSON output; every request logs filename, text_length, cached status, and processing...
app/api/v1/documents.py — POST /api/v1/documents/upload:
Accepts multipart file upload, validates extension (.pdf/.txt/.eml)
Calls document_service → (cache hit) return cached result or → llm_service → SET cache → return result
Returns DocumentResponse: filename, text_length, cached (bool), llm_analysis
app/schemas/document.py — Pydantic models: LLMAnalysis (summary, document_type, key_points, important_entities), DocumentResponse
Frontend (React 19 + TypeScript + Vite 7)
src/api/documentApi.ts — fetch wrapper using VITE_API_URL env var; POST multipart form, returns typed DocumentResponse
src/components/FileUpload.tsx — Drag-and-drop or click-to-browse, validates file type client-side, calls API on submit
src/components/ResultCard.tsx — Renders LLM analysis with summary, document type badge, key points list, entities list, cache hit indicator
src/components/Loader.tsx — Spinner shown during API call
src/types/document.ts — TypeScript interfaces mirroring Pydantic schemas
Nginx production image (Node build → Nginx static serve + /api reverse proxy to backend)
Caching Strategy
SHA-256 hash of extracted text content as cache key — identical documents (regardless of filename) always hit cache
1-hour TTL balances freshness with LLM cost savings
Redis 7 AOF persistence ensures cache survives restarts, avoiding cold-start cost spikes
`cached: true` field in response gives clients visibility into cache hit status
Observability
Langfuse traces every LLM call: prompt sent, response received, model used, token counts (prompt + completion), estimated cost
Langfuse dashboard provides cost analytics, latency histograms, and error rates across all document processing calls
Loguru structured JSON logs on every request for correlation with application monitoring
Docker + Deployment
Backend: multi-stage Dockerfile (Python 3.11-slim → poetry install → Gunicorn + UvicornWorker, runs as non-root UID 1001)
Frontend: multi-stage Dockerfile (Node 20 build → Nginx production image with SPA fallback + /api reverse proxy)
docker-compose.yml orchestrates api + frontend + redis with internal network (Redis not exposed externally)
Full stack: `make docker-up`; development: `make dev` (concurrent backend + frontend + Redis)
CI/CD (GitHub Actions)
backend job: Ruff lint, Ruff format check, pytest
frontend job: ESLint, TypeScript type-check (tsc --noEmit), Vite production build
docker job: build backend image, build frontend image
Runs on every push and pull request
Design Decisions
Chose SHA-256 content hash as cache key (not filename) — identical documents with different names always hit cache, and renamed files don't pollute the...
Separated document_service.py (extraction + cache) from llm_service.py (OpenAI + Langfuse) — keeps LLM logic isolated, easy to swap models or add new...
Used Redis 7 with AOF persistence over in-memory caching — survives container restarts, shareable across multiple API replicas, and TTL management is...
Langfuse for LLM observability over custom logging — provides out-of-the-box cost tracking, latency histograms, and prompt/response replay. Optional...
Gunicorn + UvicornWorker over plain Uvicorn in production — Gunicorn manages worker lifecycle (crashes, memory limits), UvicornWorker provides async...
GPT-4o-mini over GPT-4o — cost-effective for structured extraction tasks (~10x cheaper), sufficient quality for summary/classification/entity...
Nginx reverse proxy in frontend Docker image — SPA fallback (all routes serve index.html), /api proxy to backend, static asset caching headers....
Production Docker image runs as non-root user (UID 1001) — security best practice for containerised services, prevents privilege escalation if...
Poetry for backend dependency management — lockfile ensures reproducible installs across dev, CI, and production. Separation of main and dev...
Pydantic settings (app/core/config.py) reads all config from environment — single source of truth for configuration, type-validated, works with .env...
Tradeoffs & Constraints
SHA-256 cache key means any text change (even whitespace) produces a cache miss — intentional for correctness, but means slightly reformatted identical...
Synchronous PDF extraction with pypdf — blocking for very large PDFs. For production scale with 100+ page documents, would move to background job queue...
Single LLM call per document — no chunking for very long documents. GPT-4o-mini context window (~128K tokens) handles most business documents; chunking...
Langfuse tracing adds ~20-50ms latency per LLM call (async flush) — acceptable overhead for the observability value; can be disabled via env vars in...
Redis TTL of 1 hour — documents with rapidly changing content (live reports) would return stale analysis. TTL is configurable via env var but not...
No authentication or rate limiting on upload endpoint — suitable for internal or demo use, would require API key middleware or OAuth for public-facing...
Would improve: Add chunked processing for large documents, per-user rate limiting, webhook support for async processing of large files, support for...
Outcome & Impact
Structured document analysis API returning consistent JSON (summary, document_type, key_points[], important_entities[]) for PDF, TXT, and EML uploads...
Sub-100ms cache hits for repeated documents via Redis SHA-256 keyed cache — LLM is bypassed entirely, with `cached: true` field in response confirming...
Full LLM observability via Langfuse: every GPT-4o-mini call traced with prompt, response, token usage (prompt + completion), and estimated cost...
Structured JSON logs on every request via Loguru: filename, text_length, cached status, processing time — correlatable with application monitoring and...
GitHub Actions CI enforces quality on every commit: Ruff lint + format check, pytest, ESLint, TypeScript type-check, and Docker image builds for both...
One-command local development stack (`make dev`) and one-command Docker deployment (`make docker-up`) with Redis, backend, and frontend orchestrated...
Security: Redis not exposed outside Docker network, production image runs as non-root UID 1001, CORS restricted to configured origins, secrets in...
Tech Stack
Backend: Python 3.11, FastAPI (web framework), Gunicorn + UvicornWorker (production ASGI server)
LLM: OpenAI GPT-4o-mini (document analysis — summary, type, key points, entities)
Caching: Redis 7 (AOF persistence, 1-hour TTL, SHA-256 content-keyed)
Observability: Langfuse (LLM cost + trace per call), Loguru (structured JSON request logs)
Text Extraction: pypdf (PDF), stdlib email (EML), UTF-8 decode (TXT)
Frontend: React 19, TypeScript, Vite 7 (build tool + dev server)
Containerisation: Docker (multi-stage builds — non-root production images), Docker Compose (full stack)
Web Server: Nginx (SPA fallback + /api reverse proxy in frontend image)
CI/CD: GitHub Actions (Ruff lint, pytest, ESLint, TypeScript check, Docker build on every push)
Dependency Management: Poetry (backend lockfile), npm (frontend)