FP8 Quantization

Explore our entire collection of insights, tutorials, and industry news.

AI TutorialsJune 23, 2026
Deploying GLM-5.2-FP8 (700B MoE) on Modal with 8x H200 GPUs
A technical deep-dive into self-hosting Zhipu AI's 700B parameter MoE model using serverless H200 clusters, vLLM optimizations, and FP8 quantization strategies.
Read more →
AI TutorialsMay 19, 2026
Optimizing Local LLMs for Production: Qwen2.5 vs Claude 3.5 Sonnet
A technical deep dive into deploying Qwen2.5-32B on local hardware, managing VRAM constraints, and optimizing prompt engineering to match the performance of cloud-based models like Claude 3.5 Sonnet.
Read more →

Get Rewards

Deploying GLM-5.2-FP8 (700B MoE) on Modal with 8x H200 GPUs