Loading...
Forge CLI is an AI-powered tool that automatically optimizes GPU kernels. It uses swarm AI agents to generate optimized CUDA or Triton code that outperforms PyTorch and torch.compile().
Launches the interactive wizard to optimize HuggingFace models, KernelBench tasks, or custom PyTorch files.
Typical results: 1.5x - 5x speedup over baseline PyTorch.
| Triton | Python GPU kernels, easy integration |
| CUDA | Native C++, maximum performance |
| --turbo | Fast (~2 min) |
| (default) | Balanced |
| --quality | Maximum optimization |
| Command | Description |
|---|---|
| forge | Interactive wizard |
| forge login | Sign in |
| forge credits | Check balance |
| forge browse | Browse KernelBench tasks |
| forge optimize | Start optimization |
| forge session list | View past sessions |
| forge health | Check backend status |
| OS | Windows 10+, macOS 10.15+, Linux (glibc 2.17+) |
| Arch | x64 or ARM64 |
| Node.js | 18+ (only for npm install) |
Or re-run the curl/PowerShell installer.
Add to PATH:
Windows: Restart terminal or add %USERPROFILE%\.forge\bin to PATH.
| KernelBench / Custom code | 1 credit |
| HuggingFace (single layer) | 1 credit |
| HuggingFace (multi-layer) | 2 credits |