Strix Halo HomeLab
Attachments
History
Blame
View Source
Documentation
About An Otter Wiki
Toggle dark mode
Login
Home
A - Z
Changelog
Menu
GitHub Mirror
Discord Server
Page Index
AI
AI-Capabilities-Overview
llamacpp-with-ROCm
Guides
C-States
External-GPU
Hardware-Monitoring
Power-Mode-and-Fan-Control
Power-Modes-and-Performance
Replacing-Thermal-Interfaces-On-GMKtec-EVO-X2
VM-iGPU-Passthrough
Hardware
Boards
Sixunited-AXB35
Firmware
PCs
Beelink-GTR9-Pro
Bosgame-M5
FEVM-FA-EX9
Framework-Desktop
GMKtec-EVO-X2
HP-Z2-Mini-G1a
NIMO-AI-MiniPC
Peladn-YO1
Home
Guides
AI-Capabilities
08309b
Commit
08309b
2025-07-25 14:36:08
deseven
: rested everything and added linux benchmarks
Guides/AI-Capabilities.md
..
@@ 13,15 13,25 @@
MoE models work very well, hopefully we'll see some 50-70B ones in the future, they could be the real sweet spot for this hardware.
-
Some real-life examples (KoboldCPP, Vulkan, full GPU offloading, [example config file](./gemma-3-27b.kcpps)):
+
Some test results (KoboldCPP 1.96.2, Vulkan, full GPU offloading, [example config file](./gemma-3-27b.kcpps)) **on Windows**:
| Model | Quantization | Prompt Processing | Generation Speed |
| --------------------- | ------------ | ----------------- | ---------------- |
-
| Llama 4 Scout 17B 16E | Q4_K_XL | 106.0 t/s | 12.7 t/s |
-
| Llama 3.3 70B | Q4_K_M | 51.1 t/s | 4.1 t/s |
-
| Gemma 3 27B | Q5_K_M | 94.4 t/s | 6.2 t/s |
-
| Qwen3 30B A3B | Q5_K_M | 94.5 t/s | **27.8 t/s** |
-
| GLM 4 9B | Q5_K_M | **273.7 t/s** | 15.0 t/s |
+
| Llama 3.3 70B | Q4_K_M | 50.9 t/s | 4.2 t/s |
+
| Gemma 3 27B | Q5_K_M | 94.4 t/s | 6.5 t/s |
+
| Qwen3 30B A3B | Q5_K_M | **284.1** t/s | **30.0** t/s |
+
| GLM 4 9B | Q5_K_M | 275.0 t/s | 15.9 t/s |
+
+
And **on Linux**:
+
+
| Model | Quantization | Prompt Processing | Generation Speed |
+
| --------------------- | ------------ | ----------------- | ---------------- |
+
| Llama 3.3 70B | Q4_K_M | 50.3 t/s | 4.4 t/s |
+
| Gemma 3 27B | Q5_K_M | 127.0 t/s | 7.2 t/s |
+
| Qwen3 30B A3B | Q5_K_M | 263.0 t/s | **36.0** t/s |
+
| GLM 4 9B | Q5_K_M | **316.6** t/s | 18.9 t/s |
+
+
All tests were run 3 times and the best result was picked. Generation speed is pretty stable, prompt processing speed fluctuates a bit (5-10%).
#### Additional Resources
- Deep dive into LLM usage on Strix Halo: https://llm-tracker.info/_TOORG/Strix-Halo
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9