Multi-Agent UPI Fraud Arena

The benchmark where scammers train against defenders.

Five agents — Scammer, Victim, on-device Analyzer LLM, Bank Monitor, Regulator — run adversarial fraud episodes under structural information asymmetry. Two trained adapters: the Analyzer (Qwen2.5-7B + LoRA, 8-rubric GRPO) hits 99.3 % detection / 6.7 % FPR; the Scammer (Qwen2.5-0.5B + LoRA, adversarial GRPO) bypasses rules at 93.75 % — a 0.5B model beating 70B+ frontier LLMs at detector evasion.

Open interactive demo → API docs (Swagger) Leaderboard

OpenEnv Hackathon 2026 MIT License CC-BY-4.0 Dataset n = 175 bench scenarios

v2 Detection rate

99.3%

vs 100% v1 (reward-hacked)

v2 FPR

6.7%

v1 was 36%

F1 Score

0.99

+0.03 vs v1

Novel det.

97.1%

post-2024 scams

Bench size

175

scenarios

Scammer LoRA bypass (0.5B)

93.75%

best-of-8 vs rules · beats 70B+ frontier LLMs

Five-agent arena

🎭

Scammer

Qwen2.5-0.5B + LoRA trained via GRPO to craft convincing UPI fraud scripts across banking, KYC, OTP and CEO-deepfake categories.

🛡

Analyzer LLM

Qwen2.5-7B LoRA post-trained on 8-rubric GRPO reward. v2 retrain fixed reward hacking: FPR dropped 5× while detection held at 99.3%.

🏦

Bank Monitor

Rule-based transaction watchdog that applies velocity limits, amount thresholds, and beneficiary trust scores in real-time per episode.

⚖️

Composable Reward

8-leaf rubric with independently tuneable weights. Reward hacking is made visible: toggle v1 vs v2 profiles on the same analyzer output.

API endpoints

/demo/ Interactive Gradio UI — replay curated episodes or score your own message. GET /health OpenEnv liveness probe. Returns {"status": "healthy"}. GET /metadata Environment metadata (action / observation schema, version). GET /schema Pydantic model JSON schemas for action and observation. GET /leaderboard Ranked submissions on chakravyuh-bench-v0. GET /eval v2 eval artifact — detection / FPR / F1 / per-difficulty breakdown. GET /eval/bootstrap 10k-iteration percentile bootstrap 95% confidence intervals. POST /diagnose Score one message; get full 8-rubric AnalyzerRubricV2 decomposition. /docs · /openapi.json Interactive API explorer + OpenAPI 3.1 schema.