LLMs and GPUs

In recent years, the landscape of artificial intelligence has shifted dramatically with the rise of large language models (LLMs). These models are incredibly powerful but also resource-intensive — typically requiring high-end GPUs like NVIDIA’s RTX 4090s or AMD’s latest Radeon Instinct series to run effectively.

But what if you don’t have access to such hardware? What if your budget is limited, or you already own older GPUs like the AMD Radeon RX 580? Surprisingly, there’s still a way to get meaningful performance out of these aging cards — especially with the right software stack and a bit of ingenuity.

This guide walks through how to leverage the AMD Radeon RX 580 — an aging yet capable GPU — to run large language models using llama.cpp via Vulkan API support, even though ROCm (the newer AMD compute framework) no longer supports it.

Hardware Overview: The Radeon RX 580

The Radeon RX 580 is part of AMD’s Polaris generation, released in 2016. While not cutting-edge today, it still offers:

8 GB GDDR5 memory (sufficient for many smaller models)
2,304 stream processors
14nm process
Good PCIe 3.0 bandwidth

Although it’s no longer officially supported in newer versions of ROCm, the RX 580 retains full compatibility with Vulkan drivers, making it ideal for running modern AI inference engines.

Software Stack: llama.cpp + Vulkan

llama.cpp is a lightweight C++ implementation of the LLaMA architecture that allows you to run LLMs directly on your CPU or GPU.

It supports multiple backends including:

CPU (default)
CUDA (NVIDIA)
Metal (Apple Silicon)
Vulkan (AMD & Intel GPUs)

By enabling Vulkan support during compilation, we can tap into the RX 580’s full potential.

Installing Vulkan Drivers on Debian 12

Before we build llama.cpp, we need to ensure the system has proper Vulkan support:

sudo apt install vulkan-tools libtcmalloc-minimal4 libcurl4-openssl-dev glslc cmake make git pkg-config libvulkan-dev

These packages provide:

vulkan-tools: Tools for testing Vulkan applications
libtcmalloc-minimal4: Memory allocator for performance
libcurl4-openssl-dev: For downloading models via HTTP
glslc: GLSL shader compiler (needed for Vulkan)
cmake, make, git, pkg-config: Build dependencies
libvulkan-dev: Required for Vulkan development

Once installed, you can verify Vulkan support:

vulkaninfo | grep -i RX

You should see your GPU listed in the output.

Installing llama.cpp with Vulkan Support

Let’s walk through the full installation process.

Step 1: Clone the Repository

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build

Step 2: Configure CMake for Vulkan

Build llama.cpp with Vulkan enabled:

cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_VULKAN=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON

This configuration enables:

AVX instructions for faster CPU ops
AVX2 / VNNI optimizations (for better performance on supported CPUs)
Vulkan backend support for AMD GPUs
Curl support for downloading GGUF models from Hugging Face

Step 3: Compile and Install

make -j8
echo 'export PATH=$PATH:'$(realpath bin) >> ~/.bashrc

Log out and back in to update your environment variables so llama-cli and llama-server are available in your terminal.

Running Models with llama-cli and llama-server

Now that everything is built, let’s test it out with some sample commands.

Using llama-cli

Run a model using the CLI interface:

llama-cli -m deepseek-r1:8B --device Vulkan0 -ngl 99

This command:

Loads a model named deepseek-r1:8B
Uses device Vulkan0 (first Vulkan-compatible GPU detected)
Sets -ngl 99 to offload all layers to GPU

You can optionally specify the full model path or use Hugging Face URLs (with the -hf flag if supported).

Using llama-server

To expose your model via an API endpoint:

llama-server --host 0.0.0.0 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_M --device Vulkan0 -ngl 99

This starts a server listening on all interfaces (0.0.0.0) and uses:

unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_M as the model (quantized to 4-bit)
Device Vulkan0
All layers (-ngl 99) loaded into GPU memory

Multi-GPU Setup

If you have more than one RX 580 (or other Vulkan-compatible GPUs), you can split the model across multiple devices:

llama-server --host 0.0.0.0 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q8_K_XL --device Vulkan0,Vulkan1

And for even larger models, like Qwen3-Coder-30B-A3B-Instruct-GGUF:

llama-server \
  --host 0.0.0.0 \
  -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q8_K_XL \
  -ngl 99 \
  --threads -1 \
  --ctx-size 32684 \
  --temp 0.7 \
  --min-p 0.0 \
  --top-p 0.80 \
  --top-k 20 \
  --repeat-penalty 1.05 \
  --device Vulkan0,Vulkan1,Vulkan2,Vulkan3,Vulkan4

This will use up to five GPUs, distributing load across them and enabling inference of 30B parameter models.

Updating llama.cpp

When new updates are released, just run:

cd ~/llama.cpp/
git clean -xdf
git pull
git submodule update --recursive
cd build/
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_VULKAN=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8

Performance Notes: RX 580 Limitations and Workarounds

While the RX 580 isn’t the fastest GPU on the market, it can still run impressive models when properly configured. Here are some key takeaways:

Small to medium-sized models (e.g., 7B–13B parameters) run smoothly with minimal latency.
Larger models (like 30B) require:
Quantized weights (Q4, Q8_K_XL)
Multi-GPU setup
Longer wait times for responses
Threading optimization (--threads -1)
Higher context sizes (--ctx-size)

Despite limitations, a cluster of 5 RX 580s can handle a 30B parameter model, which is quite remarkable for such older hardware.

Final Thoughts

The RX 580 may be old, but it still holds value in the world of AI inference. Thanks to the llama.cpp project’s Vulkan backend support, it’s possible to run large language models on low-cost hardware that would otherwise be unusable for AI workloads.

With careful configuration and the right software stack, you can build a capable local LLM inference rig using nothing more than a few secondhand GPUs. Whether you’re training, experimenting, or just curious about AI, this setup provides a great foundation to get started.

If you’re looking to repurpose an old rig or build a cost-effective edge AI box, the RX 580 + Vulkan + llama.cpp combination is worth exploring — and you might be surprised at what it can do.

Have questions or need help setting up your own RX 580-based LLM cluster? Leave a comment below or share your experience in the comments!

Supabase and Proxmox

In today’s rapidly evolving tech landscape, developers often need flexible and scalable solutions for hosting applications — especially when looking to self-host services like Supabase. One powerful approach is to deploy Supabase using a Proxmox VE Docker VM. This setup not only offers flexibility and isolation but also allows for easy updates and maintenance.

Why Use Proxmox VE?

Proxmox VE stands out as a free and open-source virtualization platform that supports both KVM and LXC containers. What makes it particularly appealing for developers is its ability to manage virtual machines (VMs) with full OS support, unlike some containerized alternatives.

Furthermore, Proxmox allows for Docker in a VM, which means you get the best of both worlds: the isolation and management of a VM with the lightweight efficiency of Docker containers. Since Docker in LXC containers isn’t as straightforward to maintain, deploying Docker within a Proxmox VM is the recommended way to go.

Deploying a Docker VM in Proxmox

To simplify the process, the community-scripts project provides a convenient script to create a Docker-ready VM. You can find detailed documentation at https://community-scripts.github.io/ProxmoxVE/scripts?id=docker-vm.

Here’s how to get started:

bash -c "$(curl -fsSL https://raw.githubusercontent.com/community-scripts/ProxmoxVE/main/vm/docker-vm.sh)"

This script will set up a Docker-enabled VM in Proxmox, complete with the necessary tools and configurations. Once deployed, you can access the VM via SSH or console, using the default login:

Username: root
Password: docker

Of course, you should change the default password immediately to ensure security.

🔐 Security Tip: Always update default credentials right after deployment to prevent unauthorized access.

Installing Supabase in Your Docker VM

Now that you have your Docker VM running, it’s time to install Supabase — an open-source Firebase alternative that provides real-time databases, authentication, and more. For detailed installation instructions, refer to the official Supabase documentation: https://supabase.com/docs/guides/self-hosting/docker

Step-by-Step Installation

First, you need to clone the Supabase repository and set up your project directory:

git clone --depth 1 https://github.com/supabase/supabase
mkdir supabase-project

Your folder structure should now look like this:

.
├── supabase
└── supabase-project

Next, copy the necessary Docker Compose files and environment variables:

cp -rf supabase/docker/* supabase-project
cp supabase/docker/.env.example supabase-project/.env

Now, switch to your project directory and pull the latest images:

cd supabase-project
docker compose pull

Then, start all the services in detached mode:

docker compose up -d

You can verify that all services are running correctly with:

docker compose ps

If any service is not running, try starting it manually:

docker compose start <service-name>

Accessing Supabase Studio

Once everything is up and running, you can access Supabase Studio — the admin UI for managing your Supabase project — through the API gateway on port 8000.

For example, if your VM’s IP address is 192.168.1.100, you would visit:

http://192.168.1.100:8000

By default, the login credentials are:

Username: supabase
Password: this_password_is_insecure_and_should_be_updated

As mentioned earlier, it’s critical to update the default password immediately for security reasons.

Why Choose This Approach?

Using Proxmox VE to host Supabase offers several advantages:

Full OS support: Unlike containers, VMs allow for full control over the guest OS.
Easy updates: VMs can be upgraded independently, making them more manageable than LXC containers.
Isolation: Each VM functions as a separate unit, improving stability and security.
Scalability: You can easily scale resources or replicate your setup for development and production.

Conclusion

Deploying Supabase in a Proxmox VE Docker VM is an efficient and secure way to self-host your Supabase infrastructure. It leverages the strengths of both virtualization and containerization, offering scalability and maintenance benefits.

Whether you’re a developer looking to host a custom backend or a team managing multiple projects, this setup provides an excellent foundation.

Dad Hacks

Monthly Archives: August 2025

Running Large Language Models on Cheap Old RX 580 GPUs with llama.cpp and Vulkan