Llama.cpp over Vulkan on AMD BC-250

**PREREQUISTE**
**Boot into BIOS**
press F2 on keyboard

**BIOS MENU**
Advanced
	Ipv4 PXE Support -> Disabled
Boot Option #1 (Boot into USB Fedora 40 Server)
	Hit "-"
Save and Exit

**After booting into USB**
	Troubleshooting
	    Install Fedora 40 in basic graphics mode

**After Install and first boot into Fedora**
sudo hostnamectl set-hostname YOURHOSTNAME
sudo lvextend -r -l +100%FREE /dev/mapper/fedora-root

sudo dnf makecache --refresh
sudo dnf -y group install "Development Tools"

sudo dnf -y install git cmake glslang rpmdevtools vulkan-headers vulkan-devel vulkan-tools glslc koji python3-pip ccache
cd $(mktemp -d)   && koji download-build --arch=x86_64 --arch=noarch kernel-6.2.0-63.fc38   && rm *debug*.rpm *uki*.rpm
sudo dnf -y install *
cd ~

wget https://kojipkgs.fedoraproject.org/packages/mesa/24.1.5/2.fc40/src/mesa-24.1.5-2.fc40.src.rpm
rpm2cpio mesa-24.1.5-2.fc40.src.rpm | cpio -idmv
mkdir -p rpmbuild/{SPECS,SOURCES}
mv ~/mesa.spec ~/rpmbuild/SPECS/
mv ~/gnome-shell-glthread-disable.patch ~/Mesa-MLAA-License-Clarification-Email.txt ~/rpmbuild/SOURCES/

tar xf mesa-*.tar.xz -C ~/rpmbuild/SOURCES

sed -i 's/#define AMDGPU_NAVI10_RANGE     0x01, 0x0A \/\/# 1  <= x < 10/#define AMDGPU_NAVI10_RANGE     0x01, 0x8A \/\/# 1  <= x < 10/g' ~/rpmbuild/SOURCES/mesa-24.1.5/src/amd/addrlib/src/amdgpu_asic_addr.h

cd ~/rpmbuild/SOURCES
tar -cJf mesa-24.1.5.tar.xz ./mesa-24.1.5
cd ~

sudo dnf -y install rust-paste-devel rust-proc-macro2-devel rust-quote-devel rust-syn+clone-impls-devel spirv-tools-devel expat-devel libclc-devel clang-devel flatbuffers-devel flatbuffers-compiler bindgen cbindgen meson valgrind-devel libva-devel libXfixes-devel libXdamage-devel wayland-protocols-devel clang-devel llvm-devel lm_sensors-devel xtensor-devel python3-devel python3-mako rust-packaging libunwind-devel libXrandr-devel libXxf86vm-devel libselinux-devel libomxil-bellagio-devel libxshmfence-devel libvdpau-devel mesa-libEGL-devel libglvnd-devel spirv-llvm-translator-devel libdrm-devel

rpmbuild -ba ./rpmbuild/SPECS/mesa.spec
cd ~/rpmbuild/RPMS/x86_64/
sudo rpm mesa* -ivh --force --nodeps


sudo sed -i 's/nomodeset/amdgpu.sg_display=0/g' /etc/default/grub
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

**reboot and should automatically choose  kernel-6.2.0-63.fc38**

** This will drop watts per node by 20-25w idle**
git clone https://gitlab.com/TuxThePenguin0/oberon-governor.git
cd oberon-governor/
mkdir build
cd build
cmake ..
make
sudo make install
sudo systemctl enable oberon-governor
sudo systemctl start oberon-governor

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp/

vim ./ggml/src/ggml-vulkan/ggml-vulkan.cpp

**Make the one line change as shown with "+"**
diff --git a/ggml/src/ggml-vulkan/ggml-vulkan.cpp b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
******************************************************************************************
index c7ac0e8f..7f69e6eb 100644
--- a/ggml/src/ggml-vulkan/ggml-vulkan.cpp
+++ b/ggml/src/ggml-vulkan/ggml-vulkan.cpp
@@ -1912,6 +1912,7 @@ static vk_device ggml_vk_get_device(size_t idx) {
             device->max_memory_allocation_size = props3.maxMemoryAllocationSize;
         }

+       device->max_memory_allocation_size = 2147483646;
         device->vendor_id = device->properties.vendorID;
         device->subgroup_size = subgroup_props.subgroupSize;
         device->uma = device->properties.deviceType == vk::PhysicalDeviceType::eIntegratedGpu;
******************************************************************************************

cmake -B build -DGGML_VULKAN=1
cmake --build build --config Release

pip install -U "huggingface_hub[cli]"
huggingface-cli download bartowski/Meta-Llama-3.1-8B-Instruct-GGUF Meta-Llama-3.1-8B-Instruct-Q8_0.gguf
mkdir ~/models

ln -s ~/.cache/huggingface/hub/models--bartowski--Meta-Llama-3.1-8B-Instruct-GGUF/snapshots/bf5b95e96dac0462e2a09145ec66cae9a3f12067/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf ~/models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf

*TEST llama.cpp you'll get 29-33 tok/s**
./build/bin/llama-cli -m "~/models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf" -p "You are an expert of food and food preparation. What is the difference between jam, jelly, preserves and marmalade?" -n -2 -e -ngl 33 -t 4 -c 512

**Run with OpenAI compatible API**
./build/bin/llama-server -m "~/models/Meta-Llama-3.1-8B-Instruct-Q8_0.gguf" -n -2 -e -ngl 33 -t 4 -c 4096 --host 0.0.0.0

Thanks man, for some reason the simple script on github didnt work for me but yours did, up and gaming, found your post on techpowerup forums! Appreciate the work youve shared!