VIBE
← Back to Leaderboard
AI ToolsTOOL
AI ToolsOpen SourceTOOL3h ago5.2k

About

DFlash is a lightweight block diffusion model designed for speculative decoding that accelerates LLM inference through efficient parallel token drafting. It provides pre-trained draft models for popular LLMs like Qwen and LLaMA, enabling significant speedups in text generation.

Tags

llminferencespeculative-decodingoptimizationtransformersvllmsglang

Tech Stack

Python

Comments

No comments yet.