Multi-GPU Setup

Machines with multiple GPUs can run one Prover Node instance per GPU. Each instance runs in its own container with a dedicated Fermah home directory and its own machine secret.

Each instance must have a unique machine secret. Sharing the same secret across instances will cause registration conflicts.

Configuration

Your prover-node-config.toml must include an entry for each GPU on the machine:


[[hardware.gpus]]
price = "117"
resource.gpuId = "unknown-gpu-0"
[[hardware.gpus.resource.specs]]
VRAM = 25769803776
 
[[hardware.gpus]]
price = "117"
resource.gpuId = "unknown-gpu-1"
[[hardware.gpus.resource.specs]]
VRAM = 25769803776

If your machine has more than two GPUs, add additional [[hardware.gpus]] entries following the same pattern, incrementing the gpuId index.

Container Image

Build a minimal container image for the prover node:


FROM nvidia/cuda:12.9.1-runtime-ubuntu24.04
 
RUN apt-get update && \
    apt-get install -y --no-install-recommends ca-certificates tini && \
    rm -rf /var/lib/apt/lists/*
 
ENV HOME=/root
WORKDIR /root
 
ENTRYPOINT ["/usr/bin/tini", "--", "/root/.fermah/bin/fpn"]


docker build -t fermah-fpn:24.04 .

Replace the base image with nvidia/cuda:12.2.0-runtime-ubuntu22.04 if you are running CUDA 12.2.

Preparing Home Directories

Each GPU instance needs its own Fermah home directory with a unique machine secret. Generate a new secret before copying each directory:


# GPU 0 — generate the first machine secret
prover-node gen-machine-secret
 
# Copy the home directory for GPU 1, then generate a new secret
cp -r ~/.fermah ~/.fermah-gpu1
prover-node gen-machine-secret
 
# The base ~/.fermah now has a new secret (GPU 1)
# Swap so GPU 0 keeps the original
mv ~/.fermah ~/.fermah-tmp
mv ~/.fermah-gpu1 ~/.fermah
mv ~/.fermah-tmp ~/.fermah-gpu1

For additional GPUs, repeat: copy the directory, then generate a new secret.

After creating each home directory, you must register each instance independently.

Docker Compose

Docker Compose is the recommended approach for multi-GPU setups. Use the device_ids field to pin each service to a specific GPU.

This requires the NVIDIA Container Toolkit to be installed and configured. See Install CUDA containers toolkit.


services:
  fpn-gpu0:
    image: fermah-fpn:24.04
    restart: unless-stopped
    environment:
      NVIDIA_VISIBLE_DEVICES: 0
    volumes:
      - /home/fermah/.fermah:/root/.fermah:rw
      - /var/log/fermah-gpu0:/var/log/fermah:rw
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0']
              capabilities: [gpu]
    networks:
      - fermah-net
 
  fpn-gpu1:
    image: fermah-fpn:24.04
    restart: unless-stopped
    environment:
      NVIDIA_VISIBLE_DEVICES: 1
    volumes:
      - /home/fermah/.fermah-gpu1:/root/.fermah:rw
      - /var/log/fermah-gpu1:/var/log/fermah:rw
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['1']
              capabilities: [gpu]
    networks:
      - fermah-net
 
networks:
  fermah-net:
    external: true

Create the network and start:


docker network create fermah-net
docker compose up -d

For additional GPUs, add more services following the same pattern, incrementing device_ids, NVIDIA_VISIBLE_DEVICES, and the host volume paths.

Systemd Alternative

If you prefer systemd, create one service per GPU. The key differences between services are:

Container name — unique per instance (e.g. fermah-gpu0, fermah-gpu1)
GPU device — pinned via --gpus '"device=N"' and NVIDIA_VISIBLE_DEVICES
Host directory — each instance mounts its own Fermah home to /root/.fermah inside the container

GPU 0

Create /etc/systemd/system/fermah-gpu0.service:


[Unit]
Description=Fermah Prover Node - GPU 0
After=network.target docker.service
Requires=docker.service
 
[Service]
Type=simple
User=root
 
ExecStartPre=/bin/mkdir -p /var/log/fermah-gpu0
ExecStartPre=/bin/chmod 755 /var/log/fermah-gpu0
 
ExecStart=/usr/bin/docker run --rm --name fermah-gpu0 \
  --gpus '"device=0"' \
  --network host \
  -e HOME=/root \
  -e NVIDIA_VISIBLE_DEVICES=0 \
  -v /root/.fermah:/root/.fermah \
  -v /var/log/fermah-gpu0:/var/log/fermah \
  -v /var/run/docker.sock:/var/run/docker.sock \
  fermah-fpn:24.04
 
ExecStop=/usr/bin/docker stop -t 20 fermah-gpu0
Restart=on-failure
RestartSec=5
TimeoutStopSec=20
LimitNOFILE=65535
StandardOutput=journal
StandardError=journal
 
[Install]
WantedBy=multi-user.target

GPU 1

Create /etc/systemd/system/fermah-gpu1.service:


[Unit]
Description=Fermah Prover Node - GPU 1
After=network.target docker.service
Requires=docker.service
 
[Service]
Type=simple
User=root
 
ExecStartPre=/bin/mkdir -p /var/log/fermah-gpu1
ExecStartPre=/bin/chmod 755 /var/log/fermah-gpu1
 
ExecStart=/usr/bin/docker run --rm --name fermah-gpu1 \
  --gpus '"device=1"' \
  --network host \
  -e HOME=/root \
  -e NVIDIA_VISIBLE_DEVICES=1 \
  -v /root/.fermah-gpu1:/root/.fermah \
  -v /var/log/fermah-gpu1:/var/log/fermah \
  -v /var/run/docker.sock:/var/run/docker.sock \
  fermah-fpn:24.04
 
ExecStop=/usr/bin/docker stop -t 20 fermah-gpu1
Restart=on-failure
RestartSec=5
TimeoutStopSec=20
LimitNOFILE=65535
StandardOutput=journal
StandardError=journal
 
[Install]
WantedBy=multi-user.target


sudo systemctl daemon-reload
sudo systemctl enable fermah-gpu0.service fermah-gpu1.service
sudo systemctl start fermah-gpu0.service fermah-gpu1.service

Monitoring

Each instance writes logs to its own directory (/var/log/fermah-gpu0, /var/log/fermah-gpu1, etc.):


# Docker Compose
docker compose logs fpn-gpu0 -f
docker compose logs fpn-gpu1 -f
 
# Systemd
journalctl -u fermah-gpu0.service -f
journalctl -u fermah-gpu1.service -f