Tag Archives: RX580

AI Image Generation on RX 580 Using Vulkan: A Cost-Effective Solution

This guide explores how to leverage the AMD Radeon RX 580 graphics card for AI image generation using Vulkan compute capabilities, without requiring the ROCm software stack. By utilizing stable-diffusion.cpp compiled with Vulkan support, users can take advantage of their existing hardware to run modern AI image generation models.

The approach focuses on maximizing the capabilities of older but still capable hardware, specifically targeting the 8GB VRAM of the RX 580 for efficient model execution. This method provides a cost-effective alternative to more expensive GPU options while maintaining reasonable performance for image generation tasks.

Prerequisites and Vulkan Setup

Before beginning the AI image generation setup, it is essential to have Vulkan properly installed and configured on the system. The installation process for Vulkan can be found in our related guide: Running Large Language Models on Cheap Old RX 580 GPUs with llama.cpp and Vulkan.

This prerequisite ensures that the system has the necessary graphics runtime and compute capabilities required for the Vulkan-based AI image generation framework. The Vulkan API provides a cross-platform solution for leveraging GPU compute resources, making it ideal for running AI workloads on AMD hardware.

Installing stable-diffusion.cpp with Vulkan Support

The core of this setup involves compiling and installing stable-diffusion.cpp with Vulkan support enabled. This specialized version of the stable diffusion framework is designed to utilize Vulkan compute capabilities for image generation tasks.

The installation begins by cloning the repository from GitHub, which includes all necessary submodules and dependencies:

git clone --recursive https://github.com/leejet/stable-diffusion.cpp

After cloning, navigate into the project directory and create a build directory to maintain clean separation between source and compiled files:

cd stable-diffusion.cpp
mkdir build && cd build

The compilation process requires enabling Vulkan support through CMake configuration. This step is crucial for ensuring that the application can utilize the GPU compute capabilities:

cmake .. -DSD_VULKAN=ON

Following the CMake configuration, build the project in Release mode to optimize performance:

cmake --build . --config Release

This compilation process generates the necessary executables and libraries required for running AI image generation tasks with Vulkan acceleration.

Model Preparation and Hardware Considerations

To run AI image generation on the RX 580, users must download appropriate model files in GGUF format. These models are specifically designed for efficient execution on hardware with limited VRAM. The process requires careful consideration of memory constraints, as each instance will operate on a single GPU with no ability to combine VRAM from multiple GPUs.

The 8GB VRAM of the RX 580 limits the size of models that can be fully loaded into memory. Some components of the generation process must be offloaded to the CPU, which affects overall performance but allows for operation within hardware constraints.

Model files typically include diffusion models, VAE components, CLIP encoders, and T5XXL text encoders in safetensors format. These files must be organized in a directory structure that the application can access during execution.

Sample Usage Commands

Once the system is properly configured with stable-diffusion.cpp compiled with Vulkan support, users can begin generating images using various command-line options. The following examples demonstrate different approaches to image generation with varying model configurations:

sd --diffusion-model  SD-Models/flux1-schnell-q4_0.gguf --vae SD-Models/ae.safetensors --clip_l SD-Models/clip_l.safetensors --t5xxl SD-Models/t5xxl_fp16.safetensors  -p "a lovely beagle holding a sign says 'hello'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 --clip-on-cpu

This command demonstrates basic image generation with the flux1-schnell model, using CPU offloading for CLIP processing to accommodate memory limitations.

sd --diffusion-model  SD-Models/flux1-dev-q4_0.gguf --vae SD-Models/ae.safetensors --clip_l SD-Models/clip_l.safetensors --t5xxl SD-Models/t5xxl_fp16.safetensors  -p "a lovely beagle holding a sign says 'hello'" --cfg-scale 1.0 --sampling-method euler -v --steps 4 --clip-on-cpu

This example uses the flux1-dev model, which may offer different quality characteristics compared to the schnell variant.

For users interested in enhanced realism or artistic styles, LoRA (Low-Rank Adaptation) models can be incorporated:

sd --diffusion-model  SD-Models/flux1-dev-q4_0.gguf --vae SD-Models/ae.safetensors --clip_l SD-Models/clip_l.safetensors --t5xxl SD-Models/t5xxl_fp16.safetensors  -p "a lovely beagle holding a sign says 'flux.cpp'<lora:realism_lora_comfy_converted:1>" --cfg-scale 1.0 --sampling-method euler -v --lora-model-dir SD-Models --clip-on-cpu

This command demonstrates the integration of LoRA models for enhanced image generation quality and style control.

The final example combines both the flux1-schnell model with LoRA support:

sd --diffusion-model  SD-Models/flux1-schnell-q4_0.gguf --vae SD-Models/ae.safetensors --clip_l SD-Models/clip_l.safetensors --t5xxl SD-Models/t5xxl_fp16.safetensors  -p "a lovely beagle holding a sign says 'flux.cpp'<lora:realism_lora_comfy_converted:1>" --cfg-scale 1.0 --sampling-method euler -v --lora-model-dir SD-Models --clip-on-cpu

These commands illustrate the flexibility of the stable-diffusion.cpp framework in supporting various model configurations and enhancement techniques while working within the constraints of the RX 580’s hardware specifications.

Performance Considerations

The performance of AI image generation on the RX 580 with Vulkan support will vary based on several factors including model size, generation parameters, and system configuration. The 8GB VRAM limitation means that larger models may require additional CPU offloading or reduced resolution settings to function effectively.

You should expect longer generation times compared to systems with more powerful GPUs, but the approach provides a viable solution for those working with older hardware. The Vulkan implementation helps optimize compute operations and can provide better performance than traditional CPU-based approaches while utilizing the GPU’s parallel processing capabilities.

With these steps completed, you can successfully run AI image generation on their RX 580 graphics card using Vulkan compute capabilities. This setup provides an accessible pathway for leveraging existing hardware investments for modern AI applications without requiring expensive upgrades or specialized software stacks like ROCm.

Proxmox VE GPU Passthrough on an AMD RX 560

This guide walks you through every step required to expose an AMD RX 560 graphics card to a Proxmox Virtual Environment (VE) virtual machine. The same procedure applies to other AMD GPUs such as the RX 570, RX 580, RX 7600, RX 7700, RX 7900 XT, and many others.

1. Pulling the ROM from the GPU

  1. Install the GPU
    Insert the card into any PCI‑e slot (some systems require it not to be the first slot).
  2. Boot Proxmox VE
    Log into the console.
  3. Locate the device
    Run lspci -nnk and look for a line similar to:
   01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] [1002:67ef] (rev e5)
     Subsystem: Gigabyte Technology Co., Ltd Device [1458:230a]
     Kernel driver in use: amdgpu
     Kernel modules: amdgpu
   01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]
     Subsystem: Gigabyte Technology Co., Ltd Device [1458:aae0]
     Kernel driver in use: snd_hda_intel
     Kernel modules: snd_hda_intel

The PCI bus address of the VGA controller is 01:00.0. To form the sysfs path, prepend 0000::

   /sys/bus/pci/devices/0000:01:00.0/
  1. Extract the ROM
   cd /sys/bus/pci/devices/0000\:01\:00.0/
   echo 1 > rom
   cat rom > /usr/share/kvm/RX560-4096.rom
   echo 0 > rom

The file /usr/share/kvm/RX560-4096.rom now contains the GPU ROM.


2. Configuring the Proxmox Server for PCIe Passthrough

  1. Load required kernel modules
    Edit /etc/modules and add:
   vfio
   vfio_iommu_type1
   vfio_pci
   vfio_virqfd
  1. Blacklist the native driver
    Edit /etc/modprobe.d/pve-blacklist.conf and add:
   blacklist amdgpu
  1. Create VFIO configuration
    Create /etc/modprobe.d/vfio.conf with:
   options vfio-pci ids=1002:67ff,1002:aae0 disable_vga=1
   softdep amdgpu pre: vfio-pci

The IDs come from the lspci -nnk output: 1002:67ef (VGA) and 1002:aae0 (Audio).

  1. Enable IOMMU in the kernel
    Edit /etc/default/grub and add either intel_iommu=on or amd_iommu=on to the GRUB_CMDLINE_LINUX_DEFAULT line, e.g.:
   GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
  1. Apply changes
   update-grub
   update-initramfs -u -k all
  1. Reboot
    Shut down the Proxmox host. If necessary, move the GPU to the first PCI‑e slot before powering on again.

3. Configuring a VM for PCIe Passthrough

  1. Create the VM
    Use the usual VM creation flow in Proxmox, but set the Machine type to q35.
  2. Add the GPU as a Raw Device
    In the VM hardware list:
  • Type: PCI Device
  • Bus address: the same as the GPU (e.g., 0000:01:00)
  • Enable ROM‑Bar, PCI‑Express, and Primary GPU
  1. Edit the VM configuration file
    For a VM with ID 101, edit:
   /etc/pve/nodes/pve/qemu-server/101.conf

Add or modify the hostpci0 line to reference the ROM file:

   hostpci0: 0000:01:00,pcie=1,x-vga=1,romfile=RX560-4096.rom
  1. Start the VM
    The guest will now use the passthrough GPU.

Following these steps will give you a fully functional AMD RX 560 passthrough in Proxmox VE, and the same methodology works for other AMD GPUs such as the RX 570, RX 580, RX 7600, RX 7700, RX 7900 XT, etc.


Sources

Running Large Language Models on Cheap Old RX 580 GPUs with llama.cpp and Vulkan

LLMs and GPUs

In recent years, the landscape of artificial intelligence has shifted dramatically with the rise of large language models (LLMs). These models are incredibly powerful but also resource-intensive — typically requiring high-end GPUs like NVIDIA’s RTX 4090s or AMD’s latest Radeon Instinct series to run effectively.

But what if you don’t have access to such hardware? What if your budget is limited, or you already own older GPUs like the AMD Radeon RX 580? Surprisingly, there’s still a way to get meaningful performance out of these aging cards — especially with the right software stack and a bit of ingenuity.

This guide walks through how to leverage the AMD Radeon RX 580 — an aging yet capable GPU — to run large language models using llama.cpp via Vulkan API support, even though ROCm (the newer AMD compute framework) no longer supports it.


Hardware Overview: The Radeon RX 580

The Radeon RX 580 is part of AMD’s Polaris generation, released in 2016. While not cutting-edge today, it still offers:

  • 8 GB GDDR5 memory (sufficient for many smaller models)
  • 2,304 stream processors
  • 14nm process
  • Good PCIe 3.0 bandwidth

Although it’s no longer officially supported in newer versions of ROCm, the RX 580 retains full compatibility with Vulkan drivers, making it ideal for running modern AI inference engines.


Software Stack: llama.cpp + Vulkan

llama.cpp is a lightweight C++ implementation of the LLaMA architecture that allows you to run LLMs directly on your CPU or GPU.

It supports multiple backends including:

  • CPU (default)
  • CUDA (NVIDIA)
  • Metal (Apple Silicon)
  • Vulkan (AMD & Intel GPUs)

By enabling Vulkan support during compilation, we can tap into the RX 580’s full potential.


Installing Vulkan Drivers on Debian 12

Before we build llama.cpp, we need to ensure the system has proper Vulkan support:

sudo apt install vulkan-tools libtcmalloc-minimal4 libcurl4-openssl-dev glslc cmake make git pkg-config libvulkan-dev

These packages provide:

  • vulkan-tools: Tools for testing Vulkan applications
  • libtcmalloc-minimal4: Memory allocator for performance
  • libcurl4-openssl-dev: For downloading models via HTTP
  • glslc: GLSL shader compiler (needed for Vulkan)
  • cmake, make, git, pkg-config: Build dependencies
  • libvulkan-dev: Required for Vulkan development

Once installed, you can verify Vulkan support:

vulkaninfo | grep -i RX

You should see your GPU listed in the output.


Installing llama.cpp with Vulkan Support

Let’s walk through the full installation process.

Step 1: Clone the Repository

cd ~
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
mkdir build
cd build

Step 2: Configure CMake for Vulkan

Build llama.cpp with Vulkan enabled:

cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_VULKAN=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON

This configuration enables:

  • AVX instructions for faster CPU ops
  • AVX2 / VNNI optimizations (for better performance on supported CPUs)
  • Vulkan backend support for AMD GPUs
  • Curl support for downloading GGUF models from Hugging Face

Step 3: Compile and Install

make -j8
echo 'export PATH=$PATH:'$(realpath bin) >> ~/.bashrc

Log out and back in to update your environment variables so llama-cli and llama-server are available in your terminal.


Running Models with llama-cli and llama-server

Now that everything is built, let’s test it out with some sample commands.

Using llama-cli

Run a model using the CLI interface:

llama-cli -m deepseek-r1:8B --device Vulkan0 -ngl 99

This command:

  • Loads a model named deepseek-r1:8B
  • Uses device Vulkan0 (first Vulkan-compatible GPU detected)
  • Sets -ngl 99 to offload all layers to GPU

You can optionally specify the full model path or use Hugging Face URLs (with the -hf flag if supported).

Using llama-server

To expose your model via an API endpoint:

llama-server --host 0.0.0.0 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_M --device Vulkan0 -ngl 99

This starts a server listening on all interfaces (0.0.0.0) and uses:

  • unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_M as the model (quantized to 4-bit)
  • Device Vulkan0
  • All layers (-ngl 99) loaded into GPU memory

Multi-GPU Setup

If you have more than one RX 580 (or other Vulkan-compatible GPUs), you can split the model across multiple devices:

llama-server --host 0.0.0.0 -hf unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q8_K_XL --device Vulkan0,Vulkan1

And for even larger models, like Qwen3-Coder-30B-A3B-Instruct-GGUF:

llama-server \
  --host 0.0.0.0 \
  -hf unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q8_K_XL \
  -ngl 99 \
  --threads -1 \
  --ctx-size 32684 \
  --temp 0.7 \
  --min-p 0.0 \
  --top-p 0.80 \
  --top-k 20 \
  --repeat-penalty 1.05 \
  --device Vulkan0,Vulkan1,Vulkan2,Vulkan3,Vulkan4

This will use up to five GPUs, distributing load across them and enabling inference of 30B parameter models.


Updating llama.cpp

When new updates are released, just run:

cd ~/llama.cpp/
git clean -xdf
git pull
git submodule update --recursive
cd build/
cmake .. \
  -DGGML_AVX=ON \
  -DGGML_AVX_VNNI=ON \
  -DGGML_AVX2=ON \
  -DGGML_VULKAN=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_CURL=ON
make -j8

Performance Notes: RX 580 Limitations and Workarounds

While the RX 580 isn’t the fastest GPU on the market, it can still run impressive models when properly configured. Here are some key takeaways:

  • Small to medium-sized models (e.g., 7B–13B parameters) run smoothly with minimal latency.
  • Larger models (like 30B) require:
  • Quantized weights (Q4, Q8_K_XL)
  • Multi-GPU setup
  • Longer wait times for responses
  • Threading optimization (--threads -1)
  • Higher context sizes (--ctx-size)

Despite limitations, a cluster of 5 RX 580s can handle a 30B parameter model, which is quite remarkable for such older hardware.


Final Thoughts

The RX 580 may be old, but it still holds value in the world of AI inference. Thanks to the llama.cpp project’s Vulkan backend support, it’s possible to run large language models on low-cost hardware that would otherwise be unusable for AI workloads.

With careful configuration and the right software stack, you can build a capable local LLM inference rig using nothing more than a few secondhand GPUs. Whether you’re training, experimenting, or just curious about AI, this setup provides a great foundation to get started.

If you’re looking to repurpose an old rig or build a cost-effective edge AI box, the RX 580 + Vulkan + llama.cpp combination is worth exploring — and you might be surprised at what it can do.


Have questions or need help setting up your own RX 580-based LLM cluster? Leave a comment below or share your experience in the comments!