Skip to content

Expertise Stack Contact Blog

Tag

inference

5 posts

April 25, 2026

Microsoft BitNet 1.58: The Era of 1-Bit Large Language Models
April 18, 2026

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
April 11, 2026

SGLang: Structured Generation Language for Efficient LLM Serving
April 9, 2026

vLLM: High-Throughput LLM Inference at Scale
April 8, 2026

Ollama: Run Local LLMs on Your Own Hardware

© 2026 octagono