AI Security Scanner

AI-powered API security scanner with a 5-stage pipeline: OWASP ZAP passive spider → concurrent endpoint discovery (sitemap + 150 common paths, async httpx) → custom rule checks (missing auth, SQLi, open admin, headers, sensitive data, prompt injection) → GPT-4 AI risk analysis → JSON + Markdown report. Redis queue for background deep scanning via ai-sec-worker.

Python 3.11+httpxtyperrichOWASP ZAPGPT-4 / OpenAIRedispython-dotenvruffpytestpytest-asyncioPoetryGitHub ActionsMakefile

View Code

Role

Security Engineer

Team

Solo

Company/Organization

Personal Project

The Problem

•

Security teams scanning APIs and web targets needed a unified workflow combining passive recon, rule-based checks, and AI-driven analysis — but no...

•

OWASP ZAP alone only covers passive/active scanning patterns but misses application-level logic issues like missing authentication on sensitive paths...

•

Custom rule-based scanners (manual scripts) are inconsistent, miss nuanced risks, and produce unstructured output that's hard to triage. No...

•

Discovered endpoints needed deep scanning without blocking the main scan pipeline — no background worker queue meant sequential processing and slow...

•

GPT-4 had no integration path into security tooling for flagging risks that rule-based systems miss (e.g., publicly accessible database schema files,...

The Solution

•

Built a modular Python CLI security scanner with a 5-stage pipeline and optional background worker.

CLI Entry Point (ai_sec_scan/cli.py)

•

`ai-sec-scan scan <target>` — orchestrates the full 5-stage pipeline with rich terminal output.

•

`ai-sec-worker` — runs the Redis background worker for deep scanning queued endpoints.

•

typer for CLI definition, rich for colored terminal output with progress indicators.

Stage 1 — OWASP ZAP Passive Spider (scanner/zap_scanner.py)

•

Connects to ZAP daemon via python-owasp-zap-v2.4 API client.

•

Runs passive spider on target URL, waits for completion.

•

Fetches ZAP alerts (Content-Security-Policy missing, directory browsing, anti-CSRF, vulnerable JS libs, insecure cookies, server version leak).

•

Gracefully skips if ZAP is not running — scanner continues without ZAP findings.

Stage 2 — Endpoint Discovery (scanner/discovery.py)

•

Fetches and parses sitemap.xml for known URLs.

•

Concurrently probes 150 common paths (/admin, /api, /login, /config, /debug, etc.) using httpx async with configurable concurrency.

•

Returns list of live endpoints (HTTP 200/301/302 responses).

•

Configurable timeout and headers in config.py.

Stage 3 — Custom Rule Checks (checks/)

•

basic_checks.py — Missing authentication detection (checks if sensitive paths return 200 without auth headers), SQL injection signal detection...

•

advanced_checks.py — Security headers check (X-Frame-Options, X-Content-Type-Options, Strict-Transport-Security, Content-Security-Policy),...

•

Each check returns list of findings with severity (HIGH/MEDIUM/LOW) and endpoint URL.

Stage 4 — AI Analysis (llm/analyzer.py)

•

GPT-4o-mini reads raw HTTP responses from discovered endpoints.

•

Prompt instructs GPT-4 to identify nuanced risks: publicly accessible files, misconfigured access controls, information disclosure, logic flaws not...

•

Returns AI findings with severity and detailed reason text.

•

Skips gracefully if OPENAI_API_KEY not set.

Stage 5 — Report Generation (reports/generator.py)

•

Aggregates findings from all stages (ZAP + custom + AI) into unified structure.

•

Writes security_report.json (machine-readable, all findings with severity/endpoint/reason).

•

Writes security_report.md (human-readable Markdown with tables and finding details).

•

Reports total finding count and endpoint count queued.

Redis Background Worker (scanner/queue.py + scanner/worker.py)

•

scanner/queue.py — `enqueue(endpoint)` pushes discovered endpoints to Redis list. `dequeue()` pops for processing.

•

scanner/worker.py — ai-sec-worker polls Redis queue, deep-scans each endpoint (full checks pipeline), appends findings to report.

•

Optional: scanner runs without Redis, endpoints just not queued.

Config (config.py)

•

Global timeout, default headers (User-Agent, Accept), ZAP API URL, Redis connection settings.

•

OPENAI_API_KEY, REDIS_HOST, REDIS_PORT loaded from .env via python-dotenv.

Makefile Automation (10 commands)

•

`make install` — Poetry install all deps including dev.

•

`make lint` / `make lint-fix` — ruff check / auto-fix.

•

`make format` — ruff format.

•

`make test` / `make test-cov` — pytest / pytest with coverage.

•

`make ci` — full lint + test pipeline.

•

`make scan TARGET=<url>` — run scanner against target.

•

`make worker` — start background worker.

•

`make clean` — remove build artifacts.

GitHub Actions CI (.github/workflows/ci.yml)

•

Runs on every push and PR to main/master: 1.

•

Lint — ruff check on Python 3.11.

•

Test — pytest across Python 3.11 and 3.12.

Design Decisions

•

5-stage pipeline with graceful degradation — ZAP and Redis are optional; if ZAP is not running or Redis is unavailable, the scanner skips those...

•

Separated checks into basic_checks.py (auth/SQLi/admin) and advanced_checks.py (headers/sensitive data/prompt injection) — keeps each file focused...

•

Redis queue for background deep scanning — main scan pipeline stays fast; discovered endpoints are queued and processed asynchronously by...

•

GPT-4 reads raw HTTP responses rather than structured data — LLM can identify patterns humans and rules miss (publicly accessible schema files,...

•

typer + rich for CLI — typer provides clean command definition with type hints and --help generation; rich enables colored, structured terminal...

•

python-owasp-zap-v2.4 API client over ZAP REST directly — official Python client handles ZAP API authentication, polling, and response parsing,...

•

Unified JSON + Markdown report output — JSON for programmatic consumption (CI/CD integration, SIEM ingestion), Markdown for human review in GitHub...

•

Poetry for dependency management — lockfile ensures reproducible installs across environments; pyproject.toml separates dev dependencies (ruff,...

•

ruff over flake8 + black — single tool for both linting and formatting, 10-100x faster than legacy tools, single configuration in pyproject.toml.

•

MIT license — enables security researchers, students, and teams to use, modify, and integrate the scanner without restrictions.

Tradeoffs & Constraints

•

GPT-4 analysis cost — each scan calls OpenAI API for every discovered endpoint. Cost scales with endpoint count. Mitigated by optional OPENAI_API_KEY...

•

ZAP daemon must be running separately — requires user to start ZAP with `zap.sh -daemon` before scanning. Automated ZAP startup would add Docker...

•

Rule-based checks are heuristic — SQL injection detection looks for error message patterns, not actual injection confirmation. False positives...

•

Concurrent endpoint probing may trigger rate limiting or WAF blocks on target — configurable concurrency helps but aggressive scanning can still be...

•

Redis worker processes endpoints sequentially — parallel workers would improve throughput but require distributed locking to avoid duplicate...

•

Would improve: Active injection probing mode (send crafted payloads for confirmed SQLi/XSS), authenticated scanning (session cookie/API key...

Outcome & Impact

•

Production CLI security scanner running a unified 5-stage pipeline (ZAP → discovery → custom checks → AI analysis → report) with a single command:...

•

13 security check categories across ZAP passive scanning and custom rule checks: Content-Security-Policy missing, X-Frame-Options/HSTS missing,...

•

GPT-4 AI risk analysis layer flags issues rule-based systems miss — publicly accessible database schema files, directory listings with no access...

•

Structured output in two formats: security_report.json (machine-readable, CI/CD integration ready) and security_report.md (human-readable Markdown...

•

Redis background worker (ai-sec-worker) enables async deep scanning of discovered endpoints without blocking the main pipeline — endpoints queued via...

•

Graceful degradation — ZAP not running: skips ZAP stage, continues with discovery + checks + AI. Redis unavailable: skips queuing, endpoints not...

•

GitHub Actions CI on every push and PR: ruff lint on Python 3.11, pytest across Python 3.11 and 3.12, ensuring clean code and passing tests across...

•

Makefile automation with 10 commands covers full development lifecycle: install, lint, format, test, ci, scan, worker, clean.

Tech Stack

•

Language: Python 3.11+, Poetry (dependency management + packaging)

•

HTTP: httpx (async HTTP requests for concurrent endpoint discovery and checks)

•

CLI: typer (command definition with type hints), rich (colored terminal output, progress indicators)

•

Security: python-owasp-zap-v2.4 (ZAP API client for passive spider + alerts)

•

AI Analysis: OpenAI GPT-4o-mini (nuanced vulnerability detection from raw HTTP responses)

•

Queue: Redis (endpoint queue for background deep scanning via ai-sec-worker)

•

Config: python-dotenv (.env loading for OPENAI_API_KEY, REDIS_HOST, REDIS_PORT)

•

Linting: ruff (lint + format, replaces flake8 + black)

•

Testing: pytest, pytest-asyncio (async test support)

•

CI/CD: GitHub Actions (ruff lint + pytest on Python 3.11 and 3.12)

•

Automation: Makefile (10 commands: install, lint, lint-fix, format, test, test-cov, ci, scan, worker, clean)

Back to Projects