Local Llama - Dad Hacks

If you’ve recently upgraded to Debian 13 or are fresh on a Trixie system, you may be eager to tap the power of your NVIDIA GPU for machine‑learning workloads. This post walks you through every step required to set up the necessary drivers, libraries, and build environment.

Why Enable CUDA in llama.cpp?

The original binaries of llama.cpp run on the CPU, which is perfectly fine for small models but can become a bottleneck with larger weights. By enabling the -DGGML_CUDA=ON flag, the project compiles the CUDA kernels that allow your NVIDIA GPU to perform inference. The result is a dramatic reduction in latency and a higher throughput for text generation tasks.

Prerequisites

A Debian 13 machine with an NVIDIA GPU that supports CUDA 11 or later.
Sudo access (or root) to install packages and modify system configuration.
An active internet connection so the package manager can fetch the necessary files.

Step 1 – Update Kernel Headers

Your system needs the headers that match the running kernel so that the NVIDIA driver can compile its kernel modules.

apt install linux-headers-$(uname -r)

This command pulls the headers for the current kernel release and installs them into the standard package locations.

Step 2 – Add Non‑Free Firmware Repositories

The Debian base repositories do not expose the proprietary firmware and driver packages needed for NVIDIA GPUs. By creating an additional source list file, we allow apt to pull the required non‑free components.

Create the file /etc/apt/sources.list.d/non‑free.sources and paste the following content:

Types: deb deb-src
URIs: http://deb.debian.org/debian/
Suites: trixie
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: http://security.debian.org/debian-security/
Suites: trixie-security
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb deb-src
URIs: http://deb.debian.org/debian/
Suites: trixie-updates
Components: non-free-firmware contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

After saving the file, refresh the package lists so the new entries become available:

apt update

Step 3 – Install the NVIDIA Driver and CUDA Toolkit

3.1 Bring in the NVIDIA Keyring

The NVIDIA distribution for Debian ships a keyring package that allows your system to verify the authenticity of the driver packages.

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb

3.2 Install Driver Packages

apt -V install nvidia-driver-cuda nvidia-kernel-dkms

The meta‑package nvidia-driver-cuda pulls the latest driver binaries and the CUDA toolkit for the current kernel. It also installs nvidia-kernel-dkms, which provides a Dynamic Kernel Module Support interface so the driver can be built against any future kernel version.

3.3 Regenerate Initramfs and Update GRUB

After installing the driver modules, you must ensure that the initramfs contains the new driver and that GRUB will boot into the updated kernel configuration.

update-initramfs -u -k all
update-grub

Reboot the machine to let the new driver take effect.

3.4 Install the CUDA Toolkit

With the driver in place, install the toolkit components that provide nvcc, libraries, and headers used by llama.cpp.

apt install nvidia-cuda-toolkit

Step 4 – Install Build Dependencies

The build process for llama.cpp requires several libraries and developer tools. Installing them up front keeps the compile step straightforward.

apt install libtcmalloc-minimal4 libcurl4-openssl-dev glslc cmake make git pkg-config

These packages provide memory allocation utilities, SSL support, the GLSL compiler, CMake, Make, Git, and generic build configuration tools.

Step 5 – Clone and Compile llama.cpp

With the environment prepared, fetch the source code and build it.

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_CUDA=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8
echo 'export PATH=$PATH:'$(realpath bin) >> ~/.bashrc

After the build finishes, log out and back in again so the newly added binaries become visible in your shell path.

Step 6 – Keep the Driver in Sync with Kernel Updates

Kernel upgrades are common, and the driver must be rebuilt against each new kernel. The following routine ensures the driver modules stay current.

apt install linux-headers-$(uname -r)
apt install --reinstall nvidia-driver-cuda nvidia-kernel-dkms
apt install nvidia-cuda-toolkit
update-initramfs -u -k all
update-grub

Running this sequence after any kernel upgrade guarantees that the driver continues to load correctly.

Step 7 – Updating the Source Tree

When the upstream llama.cpp project publishes a new release or a bug fix, refresh your local copy and rebuild:

cd ~
cd llama.cpp/

# Clean the working directory
git clean -xdf
mkdir build

# Pull the latest changes and submodules
git pull
git submodule update --recursive

# Rebuild
cd build/
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_CUDA=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8

Dad Hacks

Tag Archives: Local Llama

Building llama.cpp with CUDA on Debian 13 “Trixie”