Skip to Content
Network Mainnet Multi-GPU Setup

Multi-GPU Setup

Machines with multiple GPUs can run one Prover Node instance per GPU. Each instance runs in its own container with a dedicated Fermah home directory and its own machine secret.

Each instance must have a unique machine secret. Sharing the same secret across instances will cause registration conflicts.

Configuration

Your prover-node-config.toml must include an entry for each GPU on the machine:

[[hardware.gpus]] price = "117" resource.gpuId = "unknown-gpu-0" [[hardware.gpus.resource.specs]] VRAM = 25769803776 [[hardware.gpus]] price = "117" resource.gpuId = "unknown-gpu-1" [[hardware.gpus.resource.specs]] VRAM = 25769803776

If your machine has more than two GPUs, add additional [[hardware.gpus]] entries following the same pattern, incrementing the gpuId index.

Container Image

Build a minimal container image for the prover node:

FROM nvidia/cuda:12.9.1-runtime-ubuntu24.04 RUN apt-get update && \ apt-get install -y --no-install-recommends ca-certificates tini && \ rm -rf /var/lib/apt/lists/* ENV HOME=/root WORKDIR /root ENTRYPOINT ["/usr/bin/tini", "--", "/root/.fermah/bin/fpn"]
docker build -t fermah-fpn:24.04 .

Replace the base image with nvidia/cuda:12.2.0-runtime-ubuntu22.04 if you are running CUDA 12.2.

Preparing Home Directories

Each GPU instance needs its own Fermah home directory with a unique machine secret. Generate a new secret before copying each directory:

# GPU 0 — generate the first machine secret prover-node gen-machine-secret # Copy the home directory for GPU 1, then generate a new secret cp -r ~/.fermah ~/.fermah-gpu1 prover-node gen-machine-secret # The base ~/.fermah now has a new secret (GPU 1) # Swap so GPU 0 keeps the original mv ~/.fermah ~/.fermah-tmp mv ~/.fermah-gpu1 ~/.fermah mv ~/.fermah-tmp ~/.fermah-gpu1

For additional GPUs, repeat: copy the directory, then generate a new secret.

After creating each home directory, you must register each instance independently.

Docker Compose

Docker Compose is the recommended approach for multi-GPU setups. Use the device_ids field to pin each service to a specific GPU.

This requires the NVIDIA Container Toolkit to be installed and configured. See Install CUDA containers toolkit.

services: fpn-gpu0: image: fermah-fpn:24.04 restart: unless-stopped environment: NVIDIA_VISIBLE_DEVICES: 0 volumes: - /home/fermah/.fermah:/root/.fermah:rw - /var/log/fermah-gpu0:/var/log/fermah:rw - /var/run/docker.sock:/var/run/docker.sock deploy: resources: reservations: devices: - driver: nvidia device_ids: ['0'] capabilities: [gpu] networks: - fermah-net fpn-gpu1: image: fermah-fpn:24.04 restart: unless-stopped environment: NVIDIA_VISIBLE_DEVICES: 1 volumes: - /home/fermah/.fermah-gpu1:/root/.fermah:rw - /var/log/fermah-gpu1:/var/log/fermah:rw - /var/run/docker.sock:/var/run/docker.sock deploy: resources: reservations: devices: - driver: nvidia device_ids: ['1'] capabilities: [gpu] networks: - fermah-net networks: fermah-net: external: true

Create the network and start:

docker network create fermah-net docker compose up -d

For additional GPUs, add more services following the same pattern, incrementing device_ids, NVIDIA_VISIBLE_DEVICES, and the host volume paths.

Systemd Alternative

If you prefer systemd, create one service per GPU. The key differences between services are:

  • Container name — unique per instance (e.g. fermah-gpu0, fermah-gpu1)
  • GPU device — pinned via --gpus '"device=N"' and NVIDIA_VISIBLE_DEVICES
  • Host directory — each instance mounts its own Fermah home to /root/.fermah inside the container

GPU 0

Create /etc/systemd/system/fermah-gpu0.service:

[Unit] Description=Fermah Prover Node - GPU 0 After=network.target docker.service Requires=docker.service [Service] Type=simple User=root ExecStartPre=/bin/mkdir -p /var/log/fermah-gpu0 ExecStartPre=/bin/chmod 755 /var/log/fermah-gpu0 ExecStart=/usr/bin/docker run --rm --name fermah-gpu0 \ --gpus '"device=0"' \ --network host \ -e HOME=/root \ -e NVIDIA_VISIBLE_DEVICES=0 \ -v /root/.fermah:/root/.fermah \ -v /var/log/fermah-gpu0:/var/log/fermah \ -v /var/run/docker.sock:/var/run/docker.sock \ fermah-fpn:24.04 ExecStop=/usr/bin/docker stop -t 20 fermah-gpu0 Restart=on-failure RestartSec=5 TimeoutStopSec=20 LimitNOFILE=65535 StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target

GPU 1

Create /etc/systemd/system/fermah-gpu1.service:

[Unit] Description=Fermah Prover Node - GPU 1 After=network.target docker.service Requires=docker.service [Service] Type=simple User=root ExecStartPre=/bin/mkdir -p /var/log/fermah-gpu1 ExecStartPre=/bin/chmod 755 /var/log/fermah-gpu1 ExecStart=/usr/bin/docker run --rm --name fermah-gpu1 \ --gpus '"device=1"' \ --network host \ -e HOME=/root \ -e NVIDIA_VISIBLE_DEVICES=1 \ -v /root/.fermah-gpu1:/root/.fermah \ -v /var/log/fermah-gpu1:/var/log/fermah \ -v /var/run/docker.sock:/var/run/docker.sock \ fermah-fpn:24.04 ExecStop=/usr/bin/docker stop -t 20 fermah-gpu1 Restart=on-failure RestartSec=5 TimeoutStopSec=20 LimitNOFILE=65535 StandardOutput=journal StandardError=journal [Install] WantedBy=multi-user.target
sudo systemctl daemon-reload sudo systemctl enable fermah-gpu0.service fermah-gpu1.service sudo systemctl start fermah-gpu0.service fermah-gpu1.service

Monitoring

Each instance writes logs to its own directory (/var/log/fermah-gpu0, /var/log/fermah-gpu1, etc.):

# Docker Compose docker compose logs fpn-gpu0 -f docker compose logs fpn-gpu1 -f # Systemd journalctl -u fermah-gpu0.service -f journalctl -u fermah-gpu1.service -f
Last updated on