Index / tool

HumanEval

Category: Developer Tools
Pricing: Open Source
Type: TOOL
Builder: openai
GitHub: 3.3k stars
Added: May 22, 2026

About

OpenAI's code evaluation dataset and harness from the Codex paper — 164 hand-written Python problems for benchmarking LLMs on code.

Why it made the leaderboard

The 164 hand-written Python problems from OpenAI's Codex paper — still the common reference benchmark when you need to compare LLMs on code with a standard test set.

Tech Stack

Python

Comments (0)

No comments yet

Indexed by a proprietary survey. Corrections welcome.

About

Why it made the leaderboard

Tags

Tech Stack

Comments (0)