AgentOps AI Platform
Production-ready multi-agent AI system built with LangChain, LangGraph, and FastAPI. 4 specialized agents: Supervisor (planning/routing), Research (conditional web search via DuckDuckGo), Execution (output generation), Evaluator (quality scoring 1-10, 1 retry if < 7).
Role
AI Engineer & Full-stack Developer
Team
Solo
Company/Organization
Personal Project (Self-Study)
The Problem
Complex AI tasks require multi-step planning, research, execution, and quality validation that single LLM calls cannot handle — no structured way to...
Building production-ready agent systems with proper observability (tracing, metrics, cost tracking) requires integrating multiple monitoring...
No easy way to implement semantic memory that automatically persists high-quality outputs and retrieves relevant context for future tasks — reducing...
Quality control for AI outputs requires automatic evaluation and retry mechanisms, but most agent frameworks don't include self-evaluation with...
Agent systems often require expensive API keys for every tool (search APIs, news APIs) — needed free, keyless alternatives that work reliably for...
Development workflow for multi-component systems (Python backend + Node frontend + observability) lacks unified tooling — needed consistent commands...
Security for AI systems with API keys requires comprehensive protection: gitignored secrets, pre-push scanning, CI/CD secret detection, and offline...
The Solution
Built a production-ready multi-agent AI platform with LangGraph state machine orchestration and comprehensive tooling.
Agent Pipeline (LangGraph Workflow)
User Goal → Supervisor → [Research if needed] → Execution → Evaluator → Response (retry if score < 7)
Supervisor Agent (supervisor_agent.py) — Creates step-by-step execution plan from user goal. Decides if research is needed (queries requiring...
Research Agent (research_agent.py) — Conditional: only runs when Supervisor requests. Uses DuckDuckGo web search (tools/web_search.py) to gather...
Execution Agent (execution_agent.py) — Generates final output following Supervisor's plan and Research findings (if available). Produces...
Evaluator Agent (evaluator_agent.py) — Scores output quality 1-10 with reasoning: clarity (1-3), accuracy (1-3), structure (1-3), completeness...
LangGraph Orchestration
(graphs/main_graph.py):
State machine with typed nodes (Supervisor, Research, Execution, Evaluator)
Conditional edges: Supervisor → Research (if research_needed=True) else → Execution
Linear flow: Research → Execution → Evaluator
Retry logic: Evaluator → Execution (if score < 7, once) → Evaluator
Final state: return response to user
Semantic Memory System
memory_store.py — JSON file storage (memory/memory.json, gitignored) for persistence across sessions. CRUD operations for memory entries (id,...
vector_store.py — ChromaDB (memory/chroma_db/, gitignored) with Google Embeddings for semantic similarity search. Auto-saves outputs with score ≥...
FastAPI Backend
backend/main.py — FastAPI app with CORS, health endpoint, app startup/shutdown lifecycle
backend/routers/run.py — POST /run: accepts {goal: string}, runs LangGraph workflow, returns {final_output, evaluation: {passed, score, reasons},...
backend/routers/history.py — GET /history: returns past runs from memory_store (last 50 by default)
GET /health — Status check with Google API key verification (returns set/missing, never exposes key)
GET /docs — Swagger UI for interactive API documentation
Observability (Both Optional)
LangSmith (observability/langsmith.py) — Full tracing of agent execution: each agent call, LLM invocations, timing, token counts. Enable: set...
Langfuse (observability/langfuse.py) — Metrics, evaluation scores, cost tracking per task. Enable: set LANGFUSE_SECRET_KEY + LANGFUSE_PUBLIC_KEY...
trace_utils.py — Unified helpers that work with both or neither observability platform. OBSERVABILITY_ENABLED=1 to enable both simultaneously.
Next.js 14 Frontend
page.tsx — Main page with two-panel layout (input left, results right)
AgentInput.tsx — Goal text input, Execute button, loading spinner during agent run
ResultDisplay.tsx — Final output with markdown rendering, evaluation badge (score + passed status), memory_used indicator
HistoryList.tsx — Past runs with goal preview, score, timestamp, expandable detail
TypeScript throughout with strict type checking
Responsive CSS with globals.css
Makefile Automation (16 commands)
`make dev` — Start backend (port 8000) + frontend (port 3000) with hot reload
`make backend` — Backend only (uvicorn backend.main:app --reload --port 8000)
`make frontend` — Frontend only (cd frontend && npm run dev)
`make install` — Install Python (pip install -r requirements.txt) + Node deps (cd frontend && npm install)
`make venv` — Create Python virtual environment (.venv/)
`make lint` — Run Ruff (Python) + ESLint (frontend) linting
`make format` — Auto-format Python (black/ruff --fix) + frontend (prettier)
`make test` — Run Python test suite (pytest tests/)
`make pre-push` — Run format + lint + security scan (recommended before every push)
`make security-check` — Run scripts/check-secrets.sh to scan for leaked API keys
`make check` — Verify GOOGLE_API_KEY set and Python/Node dependencies installed
`make build` — Production frontend build (cd frontend && npm run build)
`make prod` — Run backend in production mode (without --reload)
`make kill` — Kill processes on ports 8000 and 3000
`make clean` — Remove __pycache__, .next/, dist/, build artifacts
(+ venv creation and environment setup utilities)
GitHub Actions CI/CD (.github/workflows/ci.yml)
Runs on every push and PR: - Python setup and dependency install - Ruff linting - pytest test suite - Frontend npm install + ESLint + build -...
Security Architecture
Zero hardcoded secrets — all credentials from environment variables only
.gitignore: .env*, .env.local, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem
.gitattributes for consistent line endings and diff settings
scripts/check-secrets.sh: bash script scanning for common secret patterns (GOOGLE_API_KEY=, sk-lf-, LANGSMITH)
make pre-push: combines format + lint + security-check for comprehensive pre-push validation
OFFLINE_MODE=1: skip LLM calls for testing without API costs or internet access
GEMINI_MODEL env var: override model (default: gemini-2.0-flash) for testing with different models
Design Decisions
Chose LangGraph over custom orchestration — state machine-based workflow with typed nodes, conditional edges, and built-in cycle support. More...
Used Google Gemini 2.0 Flash for all 4 agents — cost-effective (~$0.001/task at ~$0.00015/1K tokens), fast response times (2-5 seconds per agent call),...
Implemented Evaluator agent with 1 retry threshold (score < 7) — single retry balances quality improvement with latency. Multiple retries would...
Selected ChromaDB for semantic memory — embedded (no external service), free, provides efficient cosine similarity search. JSON file for metadata...
Made observability optional — LangSmith and Langfuse both disabled by default (OBSERVABILITY_ENABLED=1 to enable). System works fully without either,...
Chose DuckDuckGo for web search — no API key required (eliminates authentication complexity), free and unlimited for development/testing, returns...
Built conditional Research agent — Supervisor decides if research needed rather than always searching. Reduces latency for knowledge tasks (math,...
Used FastAPI with Pydantic — automatic request/response validation, built-in Swagger docs (/docs), async support for concurrent requests, type hints...
Added OFFLINE_MODE environment variable — skip LLM calls for testing logic without API costs or internet. Returns mock responses for unit testing agent...
Separated backend/src/memory/tools/observability/frontend directories — clear separation of concerns, each module independently testable, easy to add...
Implemented make pre-push as comprehensive pre-commit gate — format + lint + security scan runs before pushing, catches formatting inconsistencies,...
Added GitHub PR template and issue templates — ensures consistent PR descriptions with checklist (tests pass, make pre-push run, security check done),...
Tradeoffs & Constraints
Chose LangGraph workflow over autonomous agents — more predictable and auditable (you know exactly which agents run in what order), but less flexible...
Single retry (score < 7) instead of multiple retries — balances quality with latency (10-20 seconds per task). Multiple retries could improve quality...
ChromaDB embedded storage — perfect for single-instance deployment (no external service), but would need Pinecone/Weaviate for distributed...
DuckDuckGo web search — free and keyless but less comprehensive than paid alternatives (Google Search API, Bing API). Results can be inconsistent, no...
Memory limits (100 entries, 90-day retention) — prevents unbounded storage growth, keeps retrieval fast, maintains relevance. Trade completeness for...
Optional observability (disabled by default) — reduces required API keys and setup complexity, but means new deployments have no monitoring until...
Google Gemini 2.0 Flash for all agents — cost-effective (~$0.001/task) but less capable than GPT-4 or Gemini 1.5 Pro for complex reasoning tasks. Trade...
JSON file for memory persistence (memory.json) — simple and portable but not suitable for concurrent access (multiple simultaneous requests could...
No real-time streaming — results return all at once after full agent pipeline completes (10-20 seconds). Token streaming would improve perceived...
Stateless FastAPI design — horizontally scalable and simple, but each request loads ChromaDB fresh. Session-based context would require persistent...
Would improve: Add SSE streaming for real-time token display, implement multi-user support with isolated memory per user, add more tools (code...
Outcome & Impact
Production-ready multi-agent AI system executing complex tasks through structured 4-agent pipeline: Supervisor (planning, memory retrieval) →...
Cost-effective execution at ~$0.001 per task (~$1/month for 1000 tasks) using Google Gemini 2.0 Flash for all 4 agents. Performance: 10-20 seconds...
3 FastAPI REST endpoints with Swagger docs: POST /run (accepts {goal: string}, returns {final_output, evaluation: {passed, score, reasons},...
ChromaDB semantic memory system: automatically saves successful outputs (score ≥ 7) with Google Embeddings, retrieves relevant past context during...
4 specialized agents implemented as separate Python modules: supervisor_agent.py (planning, routing, memory retrieval), research_agent.py (DuckDuckGo...
LangGraph state machine workflow (graphs/main_graph.py): typed nodes, conditional edges (Supervisor → Research OR Execution), retry logic (Evaluator...
Dual optional observability: LangSmith (LANGSMITH_API_KEY + LANGCHAIN_TRACING_V2=true → full agent execution tracing at smith.langchain.com),...
Next.js 14 TypeScript frontend with 3 components: AgentInput.tsx (goal input, Execute button, loading state), ResultDisplay.tsx (output with...
DuckDuckGo web search integration (tools/web_search.py) — no API key required, free and unlimited, returns structured results. Only invoked when...
Makefile with 16 commands covering full development lifecycle: dev (backend + frontend with hot reload), install (Python + Node deps), venv (create...
GitHub Actions CI/CD (.github/workflows/ci.yml): runs on every push and PR with Python setup/install, Ruff lint, pytest tests, frontend npm install +...
Security-first with zero hardcoded credentials: .gitignore covers .env*, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem;...
Environment variable configuration: GOOGLE_API_KEY (required), GEMINI_MODEL (optional, default gemini-2.0-flash), LANGSMITH_API_KEY (optional),...
Comprehensive documentation: DEPLOYMENT.md (Vercel deployment with npm i -g vercel && vercel --prod, GCP Cloud Run with gcloud run deploy...
Clean project structure: backend/ (FastAPI app + routers), src/agentops_ai_platform/ (agents/ + graphs/), memory/ (memory_store.py +...
Tech Stack
Orchestration: LangChain, LangGraph (state machine-based multi-agent workflows with conditional edges)
LLM: Google Gemini 2.0 Flash (all 4 agents — Supervisor, Research, Execution, Evaluator)
Backend: Python 3.10+, FastAPI, Uvicorn (ASGI), Pydantic (request/response validation)
Memory: ChromaDB (vector store for semantic search), Google Embeddings (sentence embeddings)
Tools: DuckDuckGo Search (web research — no API key required)
Observability: LangSmith (agent tracing — optional), Langfuse (metrics/cost—optional)
Frontend: Next.js 14 (App Router), React 18, TypeScript (strict mode)
CI/CD: GitHub Actions (lint/test/build/security scan on push and PR)
Security: scripts/check-secrets.sh (pre-push secret scanner), OFFLINE_MODE for testing
Automation: Makefile (16 commands for dev/install/quality/security/build/maintenance)