← Back to Leaderboard
AI ToolsTOOL
AI ToolsTOOLLocateAnything
research.nvidia.comAI ToolsOpen SourceTOOL3h ago
About
LocateAnything is a fast, high-quality vision-language grounding framework that uses Parallel Box Decoding (PBD) to predict bounding boxes as atomic units in a single forward pass. It supports diverse localization tasks including document understanding, GUI grounding, dense object detection, and OCR localization. By decoding geometric elements in parallel rather than sequentially, it achieves up to 2.5× faster throughput while improving localization accuracy.
Tags
vision-languageobject detectiongroundingbounding boxvlmparallel decodingcomputer visionmultimodal
Comments
No comments yet.