Paste your prompt. We run it through an evolutionary battle — two versions compete, the winner mutates, repeat. You get back a stronger, proven prompt in minutes.
Most prompt engineering is guesswork. We apply structured selection pressure — the same mechanism that's been producing better results for 3.8 billion years.
Drop in your current system prompt and a couple of questions that represent how you actually use it.
Two variants of your prompt compete on your questions. They respond, rebut, and get scored by an impartial AI judge.
The winner survives. The best ideas from the loser get absorbed. A stronger version enters the next round.
All battle learnings are distilled into one clean, improved prompt. Ready to drop in wherever you need it.
Paste your prompt below, add one or two questions you want it to handle well, and let the arena do the rest.
Requires an Anthropic API key stored on the server (your own deployment) or sign up for a hosted plan to run without setup.
These are actual outputs from real runs — not hand-picked examples. Every score came from an AI judge. This is what evolution looks like.
Submit and poll. No SDKs required — just standard HTTP. Integrate prompt evolution directly into your CI/CD or agent pipelines.
# 1. Submit your prompt curl -X POST /optimize \ -H "Content-Type: application/json" \ -d '{ "system_prompt": "You are a support agent...", "test_questions": [ "My order is 2 weeks late", "I want a refund now" ], "n_rounds": 3 }' # Returns immediately: { "job_id": "a3f9b2...", "status": "pending" }
# 2. Poll for results curl /jobs/a3f9b2... # When done: { "status": "done", "result": { "winner_prompt": "You are a support agent who leads with empathy, then resolution...", "score_delta": 6.3, "improvement_notes": "...", "rounds": [...] } }
No surprises. A 3-round optimization costs fractions of a cent in API calls.