A working reference lab that shows how identity-aware classification, DLP, model routing, and per-user cost attribution compose into one defensible AI gateway. Walkthrough runs ~5 minutes.
Every request carries a JWT with max_classification, token_budget_daily, and allowed_mcp_servers claims. No claims, no access.
Prompts auto-classify by content. Tier mismatch returns 403 before any model ever sees the payload.
Request and response sides both scanned. Personal identifiers, secrets, and restricted markers caught before they reach the model — or the user.
Model selection by prompt complexity and data classification. Cheap for trivial queries, Opus only when reasoning demands it.
Every request is logged with user_id, model, tokens, and USD cost. Grafana surfaces org-wide and per-user spend in real time.
Salesforce, Slack, RAG knowledge, plus mocks for demo flow. Tool access governed by the same identity claims that drive classification.