VIBE
← Back to Leaderboard
Developer ToolsTOOL
Developer ToolsOpen SourceTOOL19d ago3.3k

About

OpenAI's code evaluation dataset and harness from the Codex paper — 164 hand-written Python problems for benchmarking LLMs on code.

Tags

benchmarkllmcodeopenaievaluation

Tech Stack

Python

Comments

No comments yet.