← Back to Leaderboard
Developer ToolsTOOL
About
A high-performance CUDA library for FP8/FP4 tensor operations in large language models, featuring optimized GEMM kernels, MoE fusion, and runtime JIT compilation. Designed for NVIDIA GPUs with clean, accessible code for learning GPU optimization techniques.
Tags
cudagpumachine-learningtensoroptimizationllmnvidiaperformance
Comments
No comments yet.