← Back to Leaderboard
Developer ToolsTOOL
About
DeepEval is an open-source LLM evaluation framework — think Pytest for LLMs — with 40+ research-backed metrics for RAG, agents, and safety that run as unit tests.
Why it made the leaderboard
It brings a Pytest-style workflow to LLM evaluation with 40+ ready metrics (hallucination, RAG faithfulness, answer relevancy), so you assert on model quality in the same test suite as your code.
Tags
evalllmtestingragmetricspytest
Tech Stack
Python
Comments
No comments yet.