Question 1

What is Hippocratic Bench?

Accepted Answer

Hippocratic Bench is an AI benchmark that evaluates large language models (LLMs) on real-world clinical healthcare scenarios. It simulates a 52-week dialysis clinic where AI must balance patient safety with financial sustainability, testing ethical decision-making with life-or-death consequences.

Question 2

How does Hippocratic Bench compare AI models?

Accepted Answer

Models are ranked using the Hippocratic Clinical Score, a safety-first weighted geometric score that prioritizes deaths and hospitalization burden, then treatment delivery, care quality, operations, financial sustainability, access stewardship, and regulatory performance. Access growth is evaluated separately from patient safety so unsafe expansion cannot dominate the leaderboard.

Question 3

Which AI models can be tested on Hippocratic Bench?

Accepted Answer

Hippocratic Bench supports all major LLMs via OpenRouter including Openai GPT-5.1, Openai GPT-5.1o, Claude 3.5 Sonnet, Claude 3 Opus, Gemini Pro, Gemini 1.5, Llama 3, Mistral, and any other OpenRouter-compatible model.

Question 4

Why is clinical benchmarking important for AI?

Accepted Answer

Traditional AI benchmarks test factual accuracy but don't reveal how AI behaves when lives are at stake. Hippocratic Bench creates real ethical dilemmas with compounding consequences, exposing whether AI prioritizes patient safety or profit maximization under pressure.