Telemetry
Prover Nodes run a local OpenTelemetry Collector sidecar that collects metrics and traces from the prover node process, scrapes host-level metrics, and forwards everything to the Fermah Gateway via authenticated gRPC.
The Datadog Agent has been fully deprecated as of April 17, 2026. All telemetry is now handled through the OTel Collector. If you are still running a Datadog Agent, see Migrating from Datadog Agent below.
Installation
Telemetry is installed through the Fermah install script. During installation, select the Telemetry option to automatically set up the OTel Collector.
The installer creates the collector configuration, Docker Compose file, and .env file with the required environment variables.
You can re-run the install script at any time and select only the Telemetry step to update or reinstall the collector.
Prerequisites
The OTel Collector requires root Docker (not rootless) to access host metrics through /proc, /sys, and the host filesystem.
Your server needs both Docker installations:
| Docker Mode | Used By |
|---|---|
| Root (privileged) | Telemetry collector (host metrics) |
| Rootless | Prover containers |
Both are set up during the server preparation step.
Architecture
The collector receives traces and metrics from the prover node over localhost:4317, scrapes host-level metrics (CPU, memory, disk, network), and forwards everything to the Fermah Gateway.
Prover Node ──▶ OTel Collector (localhost) ──▶ Fermah Gateway
▲
Host Metrics
(CPU, memory, disk, network)Logs are not collected by the telemetry pipeline. Only metrics and traces are forwarded to the gateway.
Environment Variables
The collector requires three environment variables, set automatically by the installer into a .env file:
| Variable | Description |
|---|---|
FERMAH_OTEL_TOKEN | Bearer token for authenticating with the Fermah Gateway |
FERMAH_GATEWAY_ENDPOINT | Gateway gRPC endpoint (e.g. telemetry.fermah.xyz:4317) |
FERMAH_OPERATOR_HOST | Friendly name for this operator (e.g. fermah-cp-1a2b). Shows as fermah.operator_host tag. Defaults to OS hostname if unset. |
Running Telemetry
The collector runs as a root Docker container. A shared fermah-net network is required for connectivity:
docker network create fermah-netStart the collector:
sudo docker compose -p fermah-telemetry up -dVerify it is running:
sudo docker compose -p fermah-telemetry logsCollector Reference
Below is the full collector configuration and Docker Compose file for reference. These are managed automatically by the installer.
docker-compose.yml
docker-compose.ymlservices:
otel-collector:
image: otel/opentelemetry-collector-contrib:0.150.1
container_name: fermah-otel-collector
restart: unless-stopped
pid: host
env_file: .env
environment:
- FERMAH_OTEL_TOKEN=${FERMAH_OTEL_TOKEN}
- FERMAH_GATEWAY_ENDPOINT=${FERMAH_GATEWAY_ENDPOINT}
- FERMAH_OPERATOR_HOST=${FERMAH_OPERATOR_HOST}
volumes:
- ./otel-collector.yml:/etc/otelcol-contrib/config.yaml:ro
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/hostfs:ro
ports:
- "4317:4317"
- "4318:4318"
networks:
- fermah-net
networks:
fermah-net:
external: trueotel-collector.yml
otel-collector.ymlreceivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics:
collection_interval: 10s
root_path: /hostfs
scrapers:
cpu:
metrics:
system.cpu.utilization:
enabled: true
system.cpu.physical.count:
enabled: true
system.cpu.logical.count:
enabled: true
disk:
filesystem:
metrics:
system.filesystem.utilization:
enabled: true
exclude_mount_points:
mount_points: ["/snap/*", "/boot", "/hostfs/snap/*", "/hostfs/boot"]
match_type: regexp
exclude_fs_types:
fs_types: [squashfs, tmpfs, devtmpfs, sysfs, proc]
match_type: strict
load:
memory:
network:
paging:
metrics:
system.paging.utilization:
enabled: true
processes:
prometheus:
config:
scrape_configs:
- job_name: otelcol
scrape_interval: 10s
static_configs:
- targets: ["0.0.0.0:8888"]
processors:
batch:
send_batch_max_size: 1000
send_batch_size: 100
timeout: 10s
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
resourcedetection:
detectors: [env, system]
system:
resource_attributes:
os.description:
enabled: true
host.arch:
enabled: true
host.cpu.vendor.id:
enabled: true
host.cpu.family:
enabled: true
host.cpu.model.id:
enabled: true
host.cpu.model.name:
enabled: true
host.cpu.stepping:
enabled: true
host.cpu.cache.l2.size:
enabled: true
resource/operator_host:
attributes:
- key: host.name
value: "${FERMAH_OPERATOR_HOST}"
action: upsert
- key: fermah.operator_host
value: "${FERMAH_OPERATOR_HOST}"
action: upsert
exporters:
otlp_grpc:
endpoint: "${FERMAH_GATEWAY_ENDPOINT}"
headers:
authorization: "Bearer ${FERMAH_OTEL_TOKEN}"
tls:
insecure: true
sending_queue:
enabled: true
queue_size: 1000
retry_on_failure:
enabled: true
connectors:
datadog/connector:
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resourcedetection, resource/operator_host, batch]
exporters: [datadog/connector, otlp_grpc]
metrics:
receivers: [datadog/connector, otlp, hostmetrics, prometheus]
processors: [memory_limiter, resourcedetection, resource/operator_host, batch]
exporters: [otlp_grpc]Prover Node Configuration
The telemetry settings in ~/.fermah/config/prover-node-config.toml:
[telemetry]
mode = "Otlp"
layers = "metrics,traces"
level = "info"
filters = []
interval = 30
temporality = "Cumulative"Make sure layers is set to "metrics,traces" (without logs) in both the [telemetry] and [zksync.telemetry] sections. The collector does not forward logs to the gateway, so exporting them would generate unnecessary overhead. If your config still has "logs,metrics,traces", remove the logs entry.
Custom Forwarding
Since every operator runs a local OTel Collector, you can easily add your own backends as additional export destinations. See Custom Export for details.
Migrating from Datadog Agent
If you are still running the Datadog Agent, bring it down first:
docker ps # find the Datadog Agent container ID
docker rm -f <container_id> # remove the Datadog Agent containerThen re-run the install script and select only the Telemetry step.