Index / tool

DFlash

Category: AI Tools
Pricing: Open Source
Type: TOOL
Builder: z-lab
GitHub: 5.6k stars
Added: Jun 18, 2026

About

DFlash is a lightweight block diffusion model designed for speculative decoding that accelerates LLM inference through efficient parallel token drafting. It provides pre-trained draft models for popular LLMs like Qwen and LLaMA, enabling significant speedups in text generation.

Why it made the leaderboard

A lightweight block diffusion model for speculative decoding: it drafts tokens in parallel to accelerate LLM inference, with pre-trained draft models for Qwen and LLaMA. Works across vLLM, SGLang, Transformers, and MLX, so it slots into an existing serving stack.

DFlash

About

Why it made the leaderboard

Tags

Tech Stack

Comments (0)