AgentOps AI Platform

Production-ready multi-agent AI system built with LangChain, LangGraph, and FastAPI. 4 specialized agents: Supervisor (planning/routing), Research (conditional web search via DuckDuckGo), Execution (output generation), Evaluator (quality scoring 1-10, 1 retry if < 7).

Python 3.10+FastAPIUvicornPydanticLangChainLangGraphGoogle Gemini 2.0 FlashChromaDBGoogle EmbeddingsNext.js 14React 18TypeScriptLangSmithLangfuseDuckDuckGo SearchGitHub ActionsMakefile

View Code

Role

AI Engineer & Full-stack Developer

Team

Solo

Company/Organization

Personal Project (Self-Study)

The Problem

•

Complex AI tasks require multi-step planning, research, execution, and quality validation that single LLM calls cannot handle — no structured way to...

•

Building production-ready agent systems with proper observability (tracing, metrics, cost tracking) requires integrating multiple monitoring...

•

No easy way to implement semantic memory that automatically persists high-quality outputs and retrieves relevant context for future tasks — reducing...

•

Quality control for AI outputs requires automatic evaluation and retry mechanisms, but most agent frameworks don't include self-evaluation with...

•

Agent systems often require expensive API keys for every tool (search APIs, news APIs) — needed free, keyless alternatives that work reliably for...

•

Development workflow for multi-component systems (Python backend + Node frontend + observability) lacks unified tooling — needed consistent commands...

•

Security for AI systems with API keys requires comprehensive protection: gitignored secrets, pre-push scanning, CI/CD secret detection, and offline...

The Solution

•

Built a production-ready multi-agent AI platform with LangGraph state machine orchestration and comprehensive tooling.

Agent Pipeline (LangGraph Workflow)

•

User Goal → Supervisor → [Research if needed] → Execution → Evaluator → Response (retry if score < 7)

•

Supervisor Agent (supervisor_agent.py) — Creates step-by-step execution plan from user goal. Decides if research is needed (queries requiring...

•

Research Agent (research_agent.py) — Conditional: only runs when Supervisor requests. Uses DuckDuckGo web search (tools/web_search.py) to gather...

•

Execution Agent (execution_agent.py) — Generates final output following Supervisor's plan and Research findings (if available). Produces...

•

Evaluator Agent (evaluator_agent.py) — Scores output quality 1-10 with reasoning: clarity (1-3), accuracy (1-3), structure (1-3), completeness...

LangGraph Orchestration

•

(graphs/main_graph.py):

•

State machine with typed nodes (Supervisor, Research, Execution, Evaluator)

•

Conditional edges: Supervisor → Research (if research_needed=True) else → Execution

•

Linear flow: Research → Execution → Evaluator

•

Retry logic: Evaluator → Execution (if score < 7, once) → Evaluator

•

Final state: return response to user

Semantic Memory System

•

memory_store.py — JSON file storage (memory/memory.json, gitignored) for persistence across sessions. CRUD operations for memory entries (id,...

•

vector_store.py — ChromaDB (memory/chroma_db/, gitignored) with Google Embeddings for semantic similarity search. Auto-saves outputs with score ≥...

FastAPI Backend

•

backend/main.py — FastAPI app with CORS, health endpoint, app startup/shutdown lifecycle

•

backend/routers/run.py — POST /run: accepts {goal: string}, runs LangGraph workflow, returns {final_output, evaluation: {passed, score, reasons},...

•

backend/routers/history.py — GET /history: returns past runs from memory_store (last 50 by default)

•

GET /health — Status check with Google API key verification (returns set/missing, never exposes key)

•

GET /docs — Swagger UI for interactive API documentation

Observability (Both Optional)

•

LangSmith (observability/langsmith.py) — Full tracing of agent execution: each agent call, LLM invocations, timing, token counts. Enable: set...

•

Langfuse (observability/langfuse.py) — Metrics, evaluation scores, cost tracking per task. Enable: set LANGFUSE_SECRET_KEY + LANGFUSE_PUBLIC_KEY...

•

trace_utils.py — Unified helpers that work with both or neither observability platform. OBSERVABILITY_ENABLED=1 to enable both simultaneously.

Next.js 14 Frontend

•

page.tsx — Main page with two-panel layout (input left, results right)

•

AgentInput.tsx — Goal text input, Execute button, loading spinner during agent run

•

ResultDisplay.tsx — Final output with markdown rendering, evaluation badge (score + passed status), memory_used indicator

•

HistoryList.tsx — Past runs with goal preview, score, timestamp, expandable detail

•

TypeScript throughout with strict type checking

•

Responsive CSS with globals.css

Makefile Automation (16 commands)

•

`make dev` — Start backend (port 8000) + frontend (port 3000) with hot reload

•

`make backend` — Backend only (uvicorn backend.main:app --reload --port 8000)

•

`make frontend` — Frontend only (cd frontend && npm run dev)

•

`make install` — Install Python (pip install -r requirements.txt) + Node deps (cd frontend && npm install)

•

`make venv` — Create Python virtual environment (.venv/)

•

`make lint` — Run Ruff (Python) + ESLint (frontend) linting

•

`make format` — Auto-format Python (black/ruff --fix) + frontend (prettier)

•

`make test` — Run Python test suite (pytest tests/)

•

`make pre-push` — Run format + lint + security scan (recommended before every push)

•

`make security-check` — Run scripts/check-secrets.sh to scan for leaked API keys

•

`make check` — Verify GOOGLE_API_KEY set and Python/Node dependencies installed

•

`make build` — Production frontend build (cd frontend && npm run build)

•

`make prod` — Run backend in production mode (without --reload)

•

`make kill` — Kill processes on ports 8000 and 3000

•

`make clean` — Remove __pycache__, .next/, dist/, build artifacts

•

(+ venv creation and environment setup utilities)

GitHub Actions CI/CD (.github/workflows/ci.yml)

•

Runs on every push and PR: - Python setup and dependency install - Ruff linting - pytest test suite - Frontend npm install + ESLint + build -...

Security Architecture

•

Zero hardcoded secrets — all credentials from environment variables only

•

.gitignore: .env*, .env.local, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem

•

.gitattributes for consistent line endings and diff settings

•

scripts/check-secrets.sh: bash script scanning for common secret patterns (GOOGLE_API_KEY=, sk-lf-, LANGSMITH)

•

make pre-push: combines format + lint + security-check for comprehensive pre-push validation

•

OFFLINE_MODE=1: skip LLM calls for testing without API costs or internet access

•

GEMINI_MODEL env var: override model (default: gemini-2.0-flash) for testing with different models

Design Decisions

•

Chose LangGraph over custom orchestration — state machine-based workflow with typed nodes, conditional edges, and built-in cycle support. More...

•

Used Google Gemini 2.0 Flash for all 4 agents — cost-effective (~$0.001/task at ~$0.00015/1K tokens), fast response times (2-5 seconds per agent call),...

•

Implemented Evaluator agent with 1 retry threshold (score < 7) — single retry balances quality improvement with latency. Multiple retries would...

•

Selected ChromaDB for semantic memory — embedded (no external service), free, provides efficient cosine similarity search. JSON file for metadata...

•

Made observability optional — LangSmith and Langfuse both disabled by default (OBSERVABILITY_ENABLED=1 to enable). System works fully without either,...

•

Chose DuckDuckGo for web search — no API key required (eliminates authentication complexity), free and unlimited for development/testing, returns...

•

Built conditional Research agent — Supervisor decides if research needed rather than always searching. Reduces latency for knowledge tasks (math,...

•

Used FastAPI with Pydantic — automatic request/response validation, built-in Swagger docs (/docs), async support for concurrent requests, type hints...

•

Added OFFLINE_MODE environment variable — skip LLM calls for testing logic without API costs or internet. Returns mock responses for unit testing agent...

•

Separated backend/src/memory/tools/observability/frontend directories — clear separation of concerns, each module independently testable, easy to add...

•

Implemented make pre-push as comprehensive pre-commit gate — format + lint + security scan runs before pushing, catches formatting inconsistencies,...

•

Added GitHub PR template and issue templates — ensures consistent PR descriptions with checklist (tests pass, make pre-push run, security check done),...

Tradeoffs & Constraints

•

Chose LangGraph workflow over autonomous agents — more predictable and auditable (you know exactly which agents run in what order), but less flexible...

•

Single retry (score < 7) instead of multiple retries — balances quality with latency (10-20 seconds per task). Multiple retries could improve quality...

•

ChromaDB embedded storage — perfect for single-instance deployment (no external service), but would need Pinecone/Weaviate for distributed...

•

DuckDuckGo web search — free and keyless but less comprehensive than paid alternatives (Google Search API, Bing API). Results can be inconsistent, no...

•

Memory limits (100 entries, 90-day retention) — prevents unbounded storage growth, keeps retrieval fast, maintains relevance. Trade completeness for...

•

Optional observability (disabled by default) — reduces required API keys and setup complexity, but means new deployments have no monitoring until...

•

Google Gemini 2.0 Flash for all agents — cost-effective (~$0.001/task) but less capable than GPT-4 or Gemini 1.5 Pro for complex reasoning tasks. Trade...

•

JSON file for memory persistence (memory.json) — simple and portable but not suitable for concurrent access (multiple simultaneous requests could...

•

No real-time streaming — results return all at once after full agent pipeline completes (10-20 seconds). Token streaming would improve perceived...

•

Stateless FastAPI design — horizontally scalable and simple, but each request loads ChromaDB fresh. Session-based context would require persistent...

•

Would improve: Add SSE streaming for real-time token display, implement multi-user support with isolated memory per user, add more tools (code...

Outcome & Impact

•

Production-ready multi-agent AI system executing complex tasks through structured 4-agent pipeline: Supervisor (planning, memory retrieval) →...

•

Cost-effective execution at ~$0.001 per task (~$1/month for 1000 tasks) using Google Gemini 2.0 Flash for all 4 agents. Performance: 10-20 seconds...

•

3 FastAPI REST endpoints with Swagger docs: POST /run (accepts {goal: string}, returns {final_output, evaluation: {passed, score, reasons},...

•

ChromaDB semantic memory system: automatically saves successful outputs (score ≥ 7) with Google Embeddings, retrieves relevant past context during...

•

4 specialized agents implemented as separate Python modules: supervisor_agent.py (planning, routing, memory retrieval), research_agent.py (DuckDuckGo...

•

LangGraph state machine workflow (graphs/main_graph.py): typed nodes, conditional edges (Supervisor → Research OR Execution), retry logic (Evaluator...

•

Dual optional observability: LangSmith (LANGSMITH_API_KEY + LANGCHAIN_TRACING_V2=true → full agent execution tracing at smith.langchain.com),...

•

Next.js 14 TypeScript frontend with 3 components: AgentInput.tsx (goal input, Execute button, loading state), ResultDisplay.tsx (output with...

•

DuckDuckGo web search integration (tools/web_search.py) — no API key required, free and unlimited, returns structured results. Only invoked when...

•

Makefile with 16 commands covering full development lifecycle: dev (backend + frontend with hot reload), install (Python + Node deps), venv (create...

•

GitHub Actions CI/CD (.github/workflows/ci.yml): runs on every push and PR with Python setup/install, Ruff lint, pytest tests, frontend npm install +...

•

Security-first with zero hardcoded credentials: .gitignore covers .env*, memory/memory.json, memory/chroma_db/, credentials.json, *.key, *.pem;...

•

Environment variable configuration: GOOGLE_API_KEY (required), GEMINI_MODEL (optional, default gemini-2.0-flash), LANGSMITH_API_KEY (optional),...

•

Comprehensive documentation: DEPLOYMENT.md (Vercel deployment with npm i -g vercel && vercel --prod, GCP Cloud Run with gcloud run deploy...

•

Clean project structure: backend/ (FastAPI app + routers), src/agentops_ai_platform/ (agents/ + graphs/), memory/ (memory_store.py +...

Tech Stack

•

Orchestration: LangChain, LangGraph (state machine-based multi-agent workflows with conditional edges)

•

LLM: Google Gemini 2.0 Flash (all 4 agents — Supervisor, Research, Execution, Evaluator)

•

Backend: Python 3.10+, FastAPI, Uvicorn (ASGI), Pydantic (request/response validation)

•

Memory: ChromaDB (vector store for semantic search), Google Embeddings (sentence embeddings)

•

Tools: DuckDuckGo Search (web research — no API key required)

•

Observability: LangSmith (agent tracing — optional), Langfuse (metrics/cost—optional)

•

Frontend: Next.js 14 (App Router), React 18, TypeScript (strict mode)

•

CI/CD: GitHub Actions (lint/test/build/security scan on push and PR)

•

Security: scripts/check-secrets.sh (pre-push secret scanner), OFFLINE_MODE for testing

•

Automation: Makefile (16 commands for dev/install/quality/security/build/maintenance)

Back to Projects