← Back to Leaderboard
Developer ToolsTOOL
About
OpenAI's code evaluation dataset and harness from the Codex paper — 164 hand-written Python problems for benchmarking LLMs on code.
Tags
benchmarkllmcodeopenaievaluation
Tech Stack
Python
Comments
No comments yet.