VIBE
← Back to Leaderboard
LocateAnything
AI ToolsTOOL
AI ToolsOpen SourceTOOL3h ago

About

LocateAnything is a fast, high-quality vision-language grounding framework that uses Parallel Box Decoding (PBD) to predict bounding boxes as atomic units in a single forward pass. It supports diverse localization tasks including document understanding, GUI grounding, dense object detection, and OCR localization. By decoding geometric elements in parallel rather than sequentially, it achieves up to 2.5× faster throughput while improving localization accuracy.

Tags

vision-languageobject detectiongroundingbounding boxvlmparallel decodingcomputer visionmultimodal

Comments

No comments yet.