commit 228a94 – Strix Halo HomeLab

Strix Halo HomeLab

Attachments History Blame View Source

About An Otter Wiki

Toggle dark mode Login

Home A - Z Changelog

Menu

GitHub Mirror
Discord Server

Page Index

AI
Guides
Hardware
- Boards
  - Sixunited-AXB35
    
    Firmware
- PCs
Home

AI
AI-Capabilities-Overview
228a94

Commit `228a94`

2025-08-21 17:18:49 lhl: ROCm 6.4.3 actually performs well.

`AI/AI-Capabilities-Overview.md` ..
@@ 103,7 103,7 @@
	If you are setting `page_pool_size` lower than the `pages_limit` you may want to try increasing, eg `amdgpu.vm_fragment_size=8` (4=64K default, 9=2M) to allocate in bigger chunks.

	### ROCm
-	The latest release version of ROCm (6.4.3 as of this writing) has preliminary rocBLAS support for Strix Halo gfx1151, but it is much slower and buggier (and doesn't have hipBLASlt kernels). For the most up-to-date support, currently it's recommended to use the latest gfx1151 [TheRock/ROCm "nightly" release](https://github.com/ROCm/TheRock/blob/main/RELEASES.md). These can be found at [https://therock-nightly-tarball.s3.amazonaws.com/](https://therock-nightly-tarball.s3.amazonaws.com/) (find the filename) or you can use the helper scripts described in the [Releases page]((https://github.com/ROCm/TheRock/blob/main/RELEASES.md)).
+	The latest release version of ROCm (6.4.3 as of this writing) has both rocBLAS and hipBLASlt support for Strix Halo gfx1151. For the most up-to-date builds, you can also install the latest gfx1151 [TheRock/ROCm "nightly" release](https://github.com/ROCm/TheRock/blob/main/RELEASES.md). These can be found at [https://therock-nightly-tarball.s3.amazonaws.com/](https://therock-nightly-tarball.s3.amazonaws.com/) (find the filename) or you can use the helper scripts described in the [Releases page]((https://github.com/ROCm/TheRock/blob/main/RELEASES.md)).

	### Performance Tips
	- If you are not using VFIO or any type of GPU passthrough, you should set `amd_iommu=off` in your kernel options for ~6% faster memory reads (actuall impact on llama.cpp tg performance tends to be smaller, about <2%. Note that when tested, `iommu=pt` does not give any speed benefit.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9