← Back to Leaderboard
AI ToolsTOOL
About
Microsoft's prompt and KV-cache compression for LLMs — up to 20x compression with minimal accuracy loss for cheaper, faster inference.
Tags
prompt-compressionllmmicrosoftinferenceefficiency
Tech Stack
Python
Comments
No comments yet.