Back to Projects

AgentOps AI Platform

Production-ready multi-agent AI system built with LangChain, LangGraph, and FastAPI. 4 specialized agents: Supervisor (planning/routing), Research (conditional web search via DuckDuckGo), Execution (output generation), Evaluator (quality scoring 1-10, 1 retry if < 7).

Python 3.10+FastAPIUvicornPydanticLangChainLangGraphGoogle Gemini 2.0 FlashChromaDBGoogle EmbeddingsNext.js 14React 18TypeScriptLangSmithLangfuseDuckDuckGo SearchGitHub ActionsMakefile

Role

AI Engineer & Full-stack Developer

Team

Solo

Company/Organization

Personal Project (Self-Study)

The Problem

Complex AI tasks require multi-step planning, research, execution, and quality validation that single LLM calls cannot handleno structured way to...

Building production-ready agent systems with proper observability (tracing, metrics, cost tracking) requires integrating multiple monitoring...

No easy way to implement semantic memory that automatically persists high-quality outputs and retrieves relevant context for future tasksreducing...

Quality control for AI outputs requires automatic evaluation and retry mechanisms, but most agent frameworks don't include self-evaluation with...

Agent systems often require expensive API keys for every tool (search APIs, news APIs)needed free, keyless alternatives that work reliably for...

Development workflow for multi-component systems (Python backend + Node frontend + observability) lacks unified toolingneeded consistent commands...

Security for AI systems with API keys requires comprehensive protection: gitignored secrets, pre-push scanning, CI/CD secret detection, and offline...

The Solution

Built a production-ready multi-agent AI platform with LangGraph state machine orchestration and comprehensive tooling.

Agent Pipeline (LangGraph Workflow)

User Goal → Supervisor → [Research if needed] → Execution → Evaluator → Response (retry if score < 7)

Supervisor Agent (supervisor_agent.py) — Creates step-by-step execution plan from user goal. Decides if research is needed (queries requiring...

Research Agent (research_agent.py) — Conditional: only runs when Supervisor requests. Uses DuckDuckGo web search (tools/web_search.py) to gather...

Execution Agent (execution_agent.py) — Generates final output following Supervisor's plan and Research findings (if available). Produces...

Evaluator Agent (evaluator_agent.py) — Scores output quality 1-10 with reasoning: clarity (1-3), accuracy (1-3), structure (1-3), completeness...

LangGraph Orchestration

(graphs/main_graph.py):

State machine with typed nodes (Supervisor, Research, Execution, Evaluator)

Conditional edges: Supervisor → Research (if research_needed=True) else → Execution

Linear flow: Research → Execution → Evaluator

Retry logic: Evaluator → Execution (if score < 7, once) → Evaluator

Final state: return response to user

Semantic Memory System

memory_store.py — JSON file storage (memory/memory.json, gitignored) for persistence across sessions. CRUD operations for memory entries (id,...

vector_store.py — ChromaDB (memory/chroma_db/, gitignored) with Google Embeddings for semantic similarity search. Auto-saves outputs with score ≥...

FastAPI Backend

backend/main.py — FastAPI app with CORS, health endpoint, app startup/shutdown lifecycle

backend/routers/run.py — POST /run: accepts {goal: string}, runs LangGraph workflow, returns {final_output, evaluation: {passed, score, reasons},...

backend/routers/history.py — GET /history: returns past runs from memory_store (last 50 by default)

GET /healthStatus check with Google API key verification (returns set/missing, never exposes key)

GET /docsSwagger UI for interactive API documentation

Observability (Both Optional)

LangSmith (observability/langsmith.py) — Full tracing of agent execution: each agent call, LLM invocations, timing, token counts. Enable: set...

Langfuse (observability/langfuse.py) — Metrics, evaluation scores, cost tracking per task. Enable: set LANGFUSE_SECRET_KEY + LANGFUSE_PUBLIC_KEY...

trace_utils.py — Unified helpers that work with both or neither observability platform. OBSERVABILITY_ENABLED=1 to enable both simultaneously.

Next.js 14 Frontend

page.tsx — Main page with two-panel layout (input left, results right)

AgentInput.tsx — Goal text input, Execute button, loading spinner during agent run

ResultDisplay.tsx — Final output with markdown rendering, evaluation badge (score + passed status), memory_used indicator

HistoryList.tsx — Past runs with goal preview, score, timestamp, expandable detail

TypeScript throughout with strict type checking

Responsive CSS with globals.css

Makefile Automation (16 commands)

`make dev`Start backend (port 8000) + frontend (port 3000) with hot reload

`make backend`Backend only (uvicorn backend.main:app --reload --port 8000)

`make frontend`Frontend only (cd frontend && npm run dev)

`make install`Install Python (pip install -r requirements.txt) + Node deps (cd frontend && npm install)

`make venv`Create Python virtual environment (.venv/)

`make lint`Run Ruff (Python) + ESLint (frontend) linting

`make format`Auto-format Python (black/ruff --fix) + frontend (prettier)

`make test`Run Python test suite (pytest tests/)

`make pre-push`Run format + lint + security scan (recommended before every push)

`make security-check`Run scripts/check-secrets.sh to scan for leaked API keys

`make check`Verify GOOGLE_API_KEY set and Python/Node dependencies installed

`make build`Production frontend build (cd frontend && npm run build)

`make prod`Run backend in production mode (without --reload)

`make kill`Kill processes on ports 8000 and 3000

`make clean`Remove __pycache__, .next/, dist/, build artifacts

(+ venv creation and environment setup utilities)

GitHub Actions CI/CD (.github/workflows/ci.yml)

Runs on every push and PR: - Python setup and dependency install - Ruff linting - pytest test suite - Frontend npm install + ESLint + build -...

Security Architecture

Zero hardcoded secretsall credentials from environment variables only

.gitignore: .env*, .env.local, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem

.gitattributes for consistent line endings and diff settings

scripts/check-secrets.sh: bash script scanning for common secret patterns (GOOGLE_API_KEY=, sk-lf-, LANGSMITH)

make pre-push: combines format + lint + security-check for comprehensive pre-push validation

OFFLINE_MODE=1: skip LLM calls for testing without API costs or internet access

GEMINI_MODEL env var: override model (default: gemini-2.0-flash) for testing with different models

Design Decisions

Chose LangGraph over custom orchestrationstate machine-based workflow with typed nodes, conditional edges, and built-in cycle support. More...

Used Google Gemini 2.0 Flash for all 4 agentscost-effective (~$0.001/task at ~$0.00015/1K tokens), fast response times (2-5 seconds per agent call),...

Implemented Evaluator agent with 1 retry threshold (score < 7)single retry balances quality improvement with latency. Multiple retries would...

Selected ChromaDB for semantic memoryembedded (no external service), free, provides efficient cosine similarity search. JSON file for metadata...

Made observability optionalLangSmith and Langfuse both disabled by default (OBSERVABILITY_ENABLED=1 to enable). System works fully without either,...

Chose DuckDuckGo for web searchno API key required (eliminates authentication complexity), free and unlimited for development/testing, returns...

Built conditional Research agentSupervisor decides if research needed rather than always searching. Reduces latency for knowledge tasks (math,...

Used FastAPI with Pydanticautomatic request/response validation, built-in Swagger docs (/docs), async support for concurrent requests, type hints...

Added OFFLINE_MODE environment variableskip LLM calls for testing logic without API costs or internet. Returns mock responses for unit testing agent...

Separated backend/src/memory/tools/observability/frontend directoriesclear separation of concerns, each module independently testable, easy to add...

Implemented make pre-push as comprehensive pre-commit gateformat + lint + security scan runs before pushing, catches formatting inconsistencies,...

Added GitHub PR template and issue templatesensures consistent PR descriptions with checklist (tests pass, make pre-push run, security check done),...

Tradeoffs & Constraints

Chose LangGraph workflow over autonomous agentsmore predictable and auditable (you know exactly which agents run in what order), but less flexible...

Single retry (score < 7) instead of multiple retriesbalances quality with latency (10-20 seconds per task). Multiple retries could improve quality...

ChromaDB embedded storageperfect for single-instance deployment (no external service), but would need Pinecone/Weaviate for distributed...

DuckDuckGo web searchfree and keyless but less comprehensive than paid alternatives (Google Search API, Bing API). Results can be inconsistent, no...

Memory limits (100 entries, 90-day retention)prevents unbounded storage growth, keeps retrieval fast, maintains relevance. Trade completeness for...

Optional observability (disabled by default)reduces required API keys and setup complexity, but means new deployments have no monitoring until...

Google Gemini 2.0 Flash for all agentscost-effective (~$0.001/task) but less capable than GPT-4 or Gemini 1.5 Pro for complex reasoning tasks. Trade...

JSON file for memory persistence (memory.json)simple and portable but not suitable for concurrent access (multiple simultaneous requests could...

No real-time streamingresults return all at once after full agent pipeline completes (10-20 seconds). Token streaming would improve perceived...

Stateless FastAPI designhorizontally scalable and simple, but each request loads ChromaDB fresh. Session-based context would require persistent...

Would improve: Add SSE streaming for real-time token display, implement multi-user support with isolated memory per user, add more tools (code...

Outcome & Impact

Production-ready multi-agent AI system executing complex tasks through structured 4-agent pipeline: Supervisor (planning, memory retrieval) →...

Cost-effective execution at ~$0.001 per task (~$1/month for 1000 tasks) using Google Gemini 2.0 Flash for all 4 agents. Performance: 10-20 seconds...

3 FastAPI REST endpoints with Swagger docs: POST /run (accepts {goal: string}, returns {final_output, evaluation: {passed, score, reasons},...

ChromaDB semantic memory system: automatically saves successful outputs (score ≥ 7) with Google Embeddings, retrieves relevant past context during...

4 specialized agents implemented as separate Python modules: supervisor_agent.py (planning, routing, memory retrieval), research_agent.py (DuckDuckGo...

LangGraph state machine workflow (graphs/main_graph.py): typed nodes, conditional edges (Supervisor → Research OR Execution), retry logic (Evaluator...

Dual optional observability: LangSmith (LANGSMITH_API_KEY + LANGCHAIN_TRACING_V2=true → full agent execution tracing at smith.langchain.com),...

Next.js 14 TypeScript frontend with 3 components: AgentInput.tsx (goal input, Execute button, loading state), ResultDisplay.tsx (output with...

DuckDuckGo web search integration (tools/web_search.py)no API key required, free and unlimited, returns structured results. Only invoked when...

Makefile with 16 commands covering full development lifecycle: dev (backend + frontend with hot reload), install (Python + Node deps), venv (create...

GitHub Actions CI/CD (.github/workflows/ci.yml): runs on every push and PR with Python setup/install, Ruff lint, pytest tests, frontend npm install +...

Security-first with zero hardcoded credentials: .gitignore covers .env*, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem;...

Environment variable configuration: GOOGLE_API_KEY (required), GEMINI_MODEL (optional, default gemini-2.0-flash), LANGSMITH_API_KEY (optional),...

Comprehensive documentation: DEPLOYMENT.md (Vercel deployment with npm i -g vercel && vercel --prod, GCP Cloud Run with gcloud run deploy...

Clean project structure: backend/ (FastAPI app + routers), src/agentops_ai_platform/ (agents/ + graphs/), memory/ (memory_store.py +...

Tech Stack

Orchestration: LangChain, LangGraph (state machine-based multi-agent workflows with conditional edges)

LLM: Google Gemini 2.0 Flash (all 4 agentsSupervisor, Research, Execution, Evaluator)

Backend: Python 3.10+, FastAPI, Uvicorn (ASGI), Pydantic (request/response validation)

Memory: ChromaDB (vector store for semantic search), Google Embeddings (sentence embeddings)

Tools: DuckDuckGo Search (web researchno API key required)

Observability: LangSmith (agent tracingoptional), Langfuse (metrics/cost—optional)

Frontend: Next.js 14 (App Router), React 18, TypeScript (strict mode)

CI/CD: GitHub Actions (lint/test/build/security scan on push and PR)

Security: scripts/check-secrets.sh (pre-push secret scanner), OFFLINE_MODE for testing

Automation: Makefile (16 commands for dev/install/quality/security/build/maintenance)

Back to Projects