Tag Archives: Nvidia

Building llama.cpp with CUDA on Debian 13 “Trixie”

If you’ve recently upgraded to Debian 13 or are fresh on a Trixie system, you may be eager to tap the power of your NVIDIA GPU for machine‑learning workloads. This post walks you through every step required to set up the necessary drivers, libraries, and build environment.


Why Enable CUDA in llama.cpp?

The original binaries of llama.cpp run on the CPU, which is perfectly fine for small models but can become a bottleneck with larger weights. By enabling the -DGGML_CUDA=ON flag, the project compiles the CUDA kernels that allow your NVIDIA GPU to perform inference. The result is a dramatic reduction in latency and a higher throughput for text generation tasks.


Prerequisites

  • A Debian 13 machine with an NVIDIA GPU that supports CUDA 11 or later.
  • Sudo access (or root) to install packages and modify system configuration.
  • An active internet connection so the package manager can fetch the necessary files.

Step 1 – Update Kernel Headers

Your system needs the headers that match the running kernel so that the NVIDIA driver can compile its kernel modules.

apt install linux-headers-$(uname -r)

This command pulls the headers for the current kernel release and installs them into the standard package locations.


Step 2 – Add Non‑Free Firmware Repositories

The Debian base repositories do not expose the proprietary firmware and driver packages needed for NVIDIA GPUs. By creating an additional source list file, we allow apt to pull the required non‑free components.

Create the file /etc/apt/sources.list.d/non‑free.sources and paste the following content:

Types: deb deb-src
URIs: http://deb.debian.org/debian/
Suites: trixie
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: http://security.debian.org/debian-security/
Suites: trixie-security
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: http://deb.debian.org/debian/
Suites: trixie-updates
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

After saving the file, refresh the package lists so the new entries become available:

apt update

Step 3 – Install the NVIDIA Driver and CUDA Toolkit

3.1 Bring in the NVIDIA Keyring

The NVIDIA distribution for Debian ships a keyring package that allows your system to verify the authenticity of the driver packages.

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb

3.2 Install Driver Packages

apt -V install nvidia-driver-cuda nvidia-kernel-dkms

The meta‑package nvidia-driver-cuda pulls the latest driver binaries and the CUDA toolkit for the current kernel. It also installs nvidia-kernel-dkms, which provides a Dynamic Kernel Module Support interface so the driver can be built against any future kernel version.

3.3 Regenerate Initramfs and Update GRUB

After installing the driver modules, you must ensure that the initramfs contains the new driver and that GRUB will boot into the updated kernel configuration.

update-initramfs -u -k all
update-grub

Reboot the machine to let the new driver take effect.

3.4 Install the CUDA Toolkit

With the driver in place, install the toolkit components that provide nvcc, libraries, and headers used by llama.cpp.

apt install nvidia-cuda-toolkit

Step 4 – Install Build Dependencies

The build process for llama.cpp requires several libraries and developer tools. Installing them up front keeps the compile step straightforward.

apt install libtcmalloc-minimal4 libcurl4-openssl-dev glslc cmake make git pkg-config

These packages provide memory allocation utilities, SSL support, the GLSL compiler, CMake, Make, Git, and generic build configuration tools.


Step 5 – Clone and Compile llama.cpp

With the environment prepared, fetch the source code and build it.

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_CUDA=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8
echo 'export PATH=$PATH:'$(realpath bin) >> ~/.bashrc

After the build finishes, log out and back in again so the newly added binaries become visible in your shell path.


Step 6 – Keep the Driver in Sync with Kernel Updates

Kernel upgrades are common, and the driver must be rebuilt against each new kernel. The following routine ensures the driver modules stay current.

apt install linux-headers-$(uname -r)
apt install --reinstall nvidia-driver-cuda nvidia-kernel-dkms
apt install nvidia-cuda-toolkit
update-initramfs -u -k all
update-grub

Running this sequence after any kernel upgrade guarantees that the driver continues to load correctly.


Step 7 – Updating the Source Tree

When the upstream llama.cpp project publishes a new release or a bug fix, refresh your local copy and rebuild:

cd ~
cd llama.cpp/

# Clean the working directory
git clean -xdf
mkdir build

# Pull the latest changes and submodules
git pull
git submodule update --recursive

# Rebuild
cd build/
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_CUDA=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8

Setting up a mining system with xmr-stak built from source and Ubuntu 16.04

If using an Nvidia GPU, install the Nvidia CUDA toolkit:

Download installer type “deb(network)” from:

https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604

To install issue the following commands:

$ sudo dpkg -i cuda-repo-ubuntu1604_9.1.85-1_amd64.deb
$ sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
$ sudo apt-get update
$ sudo apt-get install cuda

Editing the enviroment to include the CUDA path:

$ sudo nano /etc/environment

Find the PATH variable and include the /usr/local/cuda-9.1/bin folder at the end of the string.

Save the file and reboot.

If using and AMG GPU, install AMD APP SDK 3.0:

Download and install the latest version from:

http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/

Untar the SDK to a location of your choice.

Decompress the file wit the following command:

$ tar -xvjf AMD-APP-SDKInstaller-v<3.0.x.y>-GA-linux64.tar.bz2

Run the installer:

$ sudo ./AMD-APP-SDKInstaller-v<3.0.x.y>-GA-linux64.sh

To fix libOpenCL issue:

$ cd $AMDAPPSDKROOT/lib/x86_64
$ sudo ln -sf sdk/libOpenCL.so.1 libOpenCL.so

then logout and login again.

Installing amdgpu-pro

Download the latest package from:
https://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx

Decompress the file wit the following command:

$ tar -xJvf amdgpu-pro-17.40-492261.tar.xz
$ cd amdgpu-pro-17.40-492261
$ sudo ./amd-pro-install -y

Reference:
https://linuxconfig.org/install-amdgpu-pro-16-50-on-ubuntu-16-04-xenial-xerus-linux

Building xmr-stak from source

Install all dependencies:

$ sudo apt install git libmicrohttpd-dev libssl-dev cmake build-essential libhwloc-dev

Create a directory for the source files and clone the source:

$ mkdir GIT-sources
$ cd GIT-sources
$ git clone https://github.com/fireice-uk/xmr-stak.git

Create a build directory:

$ mkdir xmr-stak/build
$ cd xmr-stak/build

Configuring and building xmr-stak

If building xmr-stak for CPU only mining and without http server support, use the following cmake flags:

$ cmake .. -DCUDA_ENABLE=FALSE -DOpenCL_ENABLE=FALSE -DMICROHTTPD_ENABLE=FALSE

If building xmr-stak for AMD GPU mining and CPU mining, use the following cmake flags:

$ cmake .. -DCUDA_ENABLE=FALSE

If building xmr-stak for Nvidia GPU and CPU mining, use the following cmake flags:

$ cmake .. -DOpenCL_ENABLE=FALSE

If building for all (AMD GPU, Nvidia GPU and CPU mining)

$ cmake ..

After cmake finishes, execute the following to build:

$ make -j4 install

Final system configurations

If using GPU mining (Nvidia GPU or AMD GPU), ensure the user you will use to mine is part of the video group in /etc/group

$ sudo usermod -a -G video $LOGNAME

Enabling Large Page Support for AMDGPU-PRO

Edit /etc/default/grub and add GRUB_CMDLINE_LINUX=”amdgpu.vm_fragment_size=9″

After editing ggrub, do:
$ sudo update-grub
$ sudo reboot

Configuring Large Page Support for Operating system (applies to all GPUs and CPUs)

Create a file named 98-HugePages-miner.conf in /etc/sysctl.d with the following content:

############################
vm.nr_hugepages=128
############################

Add the following lines to /etc/security/limits.conf (where “miner” is the name of your mining account):

############################
miner soft memlock 262144
miner hard memlock 262144
############################

Then reboot with:

$ reboot

You will find the xmr-stak binary in ~/GIT-sources/xmr-stak/build/bin. Run xmr-stak and follow the prompts to begin mining.