← Back to Leaderboard
AI ToolsTOOL
About
Eagle 2.5 is a frontier vision-language model (VLM) from NVIDIA designed for long-context multimodal understanding, excelling at long video comprehension and high-resolution image analysis. It introduces novel training techniques like Automatic Degrade Sampling and Image Area Preservation, along with a new 110K video dataset with story- and clip-level annotations. Its 8B parameter model matches GPT-4o and much larger open-source models on Video-MME benchmarks.
Why it made the leaderboard
NVIDIA's frontier vision-language model for long-context multimodal understanding — long video comprehension and high-resolution image analysis via techniques like Automatic Degrade Sampling and Image Area Preservation.
Tags
vision-language-modelmultimodallong-contextvideo-understandinglarge-language-modelsnvidiaopen-sourcevlm
Comments
No comments yet.