VIBE
← Back to Leaderboard
AI ToolsTOOL
AI ToolsOpen SourceTOOL1h ago3.0k

About

Eagle 2.5 is a frontier vision-language model (VLM) from NVIDIA designed for long-context multimodal understanding, excelling at long video comprehension and high-resolution image analysis. It introduces novel training techniques like Automatic Degrade Sampling and Image Area Preservation, along with a new 110K video dataset with story- and clip-level annotations. Its 8B parameter model matches GPT-4o and much larger open-source models on Video-MME benchmarks.

Why it made the leaderboard

NVIDIA's frontier vision-language model for long-context multimodal understanding — long video comprehension and high-resolution image analysis via techniques like Automatic Degrade Sampling and Image Area Preservation.

Tags

vision-language-modelmultimodallong-contextvideo-understandinglarge-language-modelsnvidiaopen-sourcevlm

Comments

No comments yet.