VIBE
← Back to Leaderboard
Developer ToolsTOOL
Developer ToolsOpen SourceTOOL1h ago16.6k

About

DeepEval is an open-source LLM evaluation framework — think Pytest for LLMs — with 40+ research-backed metrics for RAG, agents, and safety that run as unit tests.

Why it made the leaderboard

It brings a Pytest-style workflow to LLM evaluation with 40+ ready metrics (hallucination, RAG faithfulness, answer relevancy), so you assert on model quality in the same test suite as your code.

Tags

evalllmtestingragmetricspytest

Tech Stack

Python

Comments

No comments yet.