VIBE
← Back to Leaderboard
AI ToolsTOOL
AI ToolsOpen SourceTOOL19d ago6.3k

About

Microsoft's prompt and KV-cache compression for LLMs — up to 20x compression with minimal accuracy loss for cheaper, faster inference.

Tags

prompt-compressionllmmicrosoftinferenceefficiency

Tech Stack

Python

Comments

No comments yet.