LocateAnything

research.nvidia.com

AI ToolsOpen SourceTOOL3h ago

About

LocateAnything is a fast, high-quality vision-language grounding framework that uses Parallel Box Decoding (PBD) to predict bounding boxes as atomic units in a single forward pass. It supports diverse localization tasks including document understanding, GUI grounding, dense object detection, and OCR localization. By decoding geometric elements in parallel rather than sequentially, it achieves up to 2.5× faster throughput while improving localization accuracy.

Comments

No comments yet.