Skip to main content

Command Palette

Search for a command to run...

Running llama.cpp (compiled from source) on AMD Strix Halo 395

Updated
1 min read
Running llama.cpp (compiled from source) on AMD Strix Halo 395

Just a quick doc/note/tutorial for referencing myself later.

Here's how to get llama.cpp running with Vulkan support on AMD AI Max 395 (Strix Halo) based devices. I tried it on a Beelink GTR 9 Pro, but should work for Framework Ddesktop too.

Installing a few Prerequisites

sudo apt install amd-smi rocminfo glslc  libvulkan-dev vulkan-tools   mesa-vulkan-drivers   clinfo

Download and build llama.cpp

Get it from github

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp

Build it for Vulkan support

cmake -B build \
  -DGGML_VULKAN=ON \
  -DCMAKE_BUILD_TYPE=Release

cmake --build build -j$(nproc)

Run a local LLM model

Once it is built, we can run a model

./build/bin/llama-cli \
  -m ~/.lmstudio/models/lmstudio-community/GLM-4.7-Flash-GGUF/GLM-4.7-Flash-Q4_K_M.gguf \
  -ngl 999 \
  -c 4096 \
  -t $(nproc) \
  --color on \
  -p "Explain how Vulkan helps LLM inference on AMD GPUs."