Prometheus + Grafana monitoring stack for Substrate-based blockchain nodes. Simple, unified configuration that works out of the box.
- π Prometheus - Metrics collection and storage (60 days retention)
- π Grafana - Metrics visualization with pre-configured dashboards
- π₯οΈ Node Exporter - System metrics (CPU, RAM, Disk, Network)
- π Nginx Reverse Proxy - Prometheus protected with Basic Auth + Rate Limiting
- π― Network Dashboards - Pre-configured dashboards for multiple blockchain networks
- π¨ Quantus Branding - Custom logo, colors, and styling matching Quantus design
- β‘ Single Setup - One configuration, works everywhere
# 1. Clone repository
git clone <your-repo-url>
cd monitoring
# 2. (Optional) Customize credentials, SMTP, Telegram & alert emails
cp env.example .env
nano .env # Set passwords, SMTP settings, Telegram, and ALERT_EMAIL_ADDRESSES
# 3. Start the stack
docker compose up -d
# 4. Access services
open http://localhost:3000 # Grafana (public dashboards, login: admin / admin)
open http://localhost:9091 # Prometheus (admin / prometheus)That's it! π
Notes:
- Grafana: Dashboards are publicly visible, but editing requires login (
admin/admin) - Prometheus: Secured with Basic Auth (
admin/prometheus)
- Grafana: http://localhost:3000 (dashboards visible to everyone, editing requires login)
- Prometheus: http://localhost:9091 (Basic Auth:
admin/prometheus) - Node Exporter: http://localhost:9100/metrics (metrics endpoint)
The stack monitors:
- Prometheus - Self-monitoring (metrics collection system)
- Node Exporter - Docker host system metrics
- CPU usage and load averages
- Memory usage and availability
- Disk usage and I/O
- Network traffic (receive/transmit)
- System uptime
- Remote Blockchain Nodes - Heisenberg and Dirac networks
- Node metrics (system resources, peers, network I/O)
- Substrate metrics (block production, finalization)
- Mining metrics (hashrate, difficulty)
- Support Services - Telemetry and monitoring infrastructure
- Telemetry Host (qm-telemetry.quantus.cat) - VPS system metrics
- Telemetry Backend (feed-telemetry.quantus.cat) - Application metrics
- Connected nodes/feeds/shards
- Message rates and dropped messages
- Service availability
Edit prometheus/prometheus.yml to add your own node targets:
scrape_configs:
# Add your nodes here
- job_name: 'my-validator'
scrape_interval: 10s
static_configs:
- targets: ['validator1.example.com:9615']
labels:
instance: 'validator-1'
chain: 'polkadot'
role: 'validator'Reload Prometheus:
# With authentication
curl -u admin:prometheus -X POST http://localhost:9091/-/reloadOptional - create .env from .env.example:
# Grafana Configuration
GRAFANA_ADMIN_PASSWORD=admin
# Prometheus Basic Auth (via Nginx)
# Credentials are generated at nginx container startup
PROMETHEUS_USER=admin
PROMETHEUS_PASSWORD=prometheusSecurity Tip: For production, use strong credentials:
PROMETHEUS_USER=monitoring_$(openssl rand -hex 8)
PROMETHEUS_PASSWORD=$(openssl rand -base64 32)To enable email notifications in Grafana, configure SMTP settings in your .env file:
# SMTP Configuration for Grafana Email Notifications
SMTP_ENABLED=true
SMTP_HOST=smtp.example.com:587
SMTP_USER=your-email@example.com
SMTP_PASSWORD=your_smtp_password_here
SMTP_FROM_ADDRESS=your-email@example.com
SMTP_FROM_NAME=Grafana Monitoring
SMTP_STARTTLS_POLICY=MandatoryStartTLS
# Alert Email Addresses (comma-separated)
ALERT_EMAIL_ADDRESSES=admin@example.com, alerts@example.comNote: Copy env.example to .env and update with your SMTP credentials and alert email addresses:
cp env.example .env
nano .env # Edit SMTP settings and ALERT_EMAIL_ADDRESSESAfter configuring SMTP, restart Grafana:
docker compose restart grafanaTo test email notifications:
- Go to Grafana β Alerting β Contact points
- Click "New contact point"
- Select "Email" as the type
- Enter test email address
- Click "Test" to send a test email
Grafana has built-in Telegram support for instant mobile alerts. Critical alerts are automatically sent to both Telegram and Email.
Setup Steps:
1. Create a Telegram Bot:
# Open Telegram and message @BotFather
/newbot
# Follow the instructions
# You'll receive a bot token like: 123456789:ABCdefGHIjklMNOpqrsTUVwxyz2. Get your Chat ID:
# Send any message to your bot in Telegram
# Then visit this URL in your browser (replace <YOUR_BOT_TOKEN>):
https://api.telegram.org/bot<YOUR_BOT_TOKEN>/getUpdates
# Look for "chat":{"id":123456789} in the JSON response
# The number is your Chat ID3. Add to your .env file:
# Telegram Configuration
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrsTUVwxyz
TELEGRAM_CHAT_ID=1234567894. Restart Grafana:
docker compose restart grafanaAlert Routing (Already Configured):
- π΄ Critical Alerts β Telegram + Email
- π‘ Warning Alerts β Email only
- Dirac network β Highest priority (2min wait, 30min repeat)
- Heisenberg network β Medium priority (10min wait, 2h repeat)
Message Format:
π¨ Node Down
Status: firing
Severity: critical
Chain: dirac
Instance: a1-qm-dirac.quantus.cat
π Node a1-qm-dirac.quantus.cat is DOWN
Node is down for more than 5 minutes - check immediately
π View in Grafana
To test:
- Go to Grafana β Alerting β Contact points
- Find "Telegram Notifications"
- Click "Test" to send a test message
Note: If you don't configure Telegram (leave variables empty), only Email notifications will be used.
Alerts are configured via provisioning files in grafana/provisioning/alerting/:
Pre-configured Alerts:
Node Health:
- π΄ Node Down - Triggers when a node is unreachable for 5+ minutes
- π΄ No New Blocks - Triggers when no new blocks produced for 3+ minutes
- π‘ Low Peer Count - Triggers when peer count drops below 3
System Resources:
- π΄ Low Disk Space - Triggers when disk usage exceeds 85%
- π‘ High CPU Usage - Triggers when CPU usage exceeds 80% for 15+ minutes
- π‘ High Memory Usage - Triggers when memory usage exceeds 90%
Support Services:
- π΄ Telemetry Host Down - Triggers when telemetry host is unreachable for 5+ minutes
Customizing Alert Email:
Alert email addresses are configured in your .env file. Edit the ALERT_EMAIL_ADDRESSES variable:
# Single email
ALERT_EMAIL_ADDRESSES=your-email@example.com
# Multiple emails (comma-separated)
ALERT_EMAIL_ADDRESSES=email1@example.com, email2@example.com, team@example.comAfter editing .env, rebuild and restart Grafana:
docker compose up -d --build grafanaAdding Custom Alerts:
Edit grafana/provisioning/alerting/rules.yml. Use the reduce + threshold pattern:
- uid: custom-alert
title: My Custom Alert
condition: C # Final threshold step
data:
# Step A: Prometheus query
- refId: A
datasourceUid: prometheus
model:
datasource:
type: prometheus
uid: prometheus
expr: your_prometheus_query_here
refId: A
instant: false
range: true
# Step B: Reduce to single value
- refId: B
datasourceUid: __expr__
model:
datasource:
type: __expr__
uid: __expr__
expression: A
reducer: last # or min, max, mean
refId: B
type: reduce
# Step C: Threshold comparison
- refId: C
datasourceUid: __expr__
model:
datasource:
type: __expr__
uid: __expr__
conditions:
- evaluator:
params: [threshold_value]
type: gt # gt (>), lt (<), eq (=)
operator:
type: and
query:
params: [C]
reducer:
params: []
type: last
type: query
expression: B
refId: C
type: threshold
for: 5m
annotations:
description: 'Alert description with {{ $value }}'
summary: 'Alert summary'
labels:
severity: warning # or critical
notification_settings:
receiver: Email NotificationsAlert Notification Policies:
Policies are configured in grafana/provisioning/alerting/policies.yml with different priorities for each network:
| Network | Priority | First Notification | Repeat Interval |
|---|---|---|---|
| Dirac π΄ | Highest | 2 minutes | every 30 min |
| Heisenberg π‘ | Medium | 10 minutes | every 2h |
Fallback by severity (if no chain label):
- Critical alerts (severity=critical): 10s wait, repeat every 1h
- Warning alerts (severity=warning): 30s wait, repeat every 4h
After changing alert configuration, restart Grafana:
docker compose restart grafanaTroubleshooting Alert Provisioning:
If you see errors like UNIQUE constraint failed: alert_rule.guid, it means alerts were already created in Grafana UI and conflict with provisioned alerts. To fix:
# Option 1: Reset Grafana data (loses all UI changes)
docker compose down
docker volume rm monitoring_grafana-data
docker compose up -d
# Option 2: Change UIDs in rules.yml if you want to keep existing alerts
# Edit each alert's 'uid' field to a unique valueNote: With provisioning, manage alerts through YAML files instead of the UI. UI changes may conflict with provisioned configuration.
Place JSON dashboard files in grafana/dashboards/ directory. They will be automatically loaded on startup.
You can export dashboards from:
- Grafana Dashboard Repository
- Your existing Grafana instance
# All services
docker compose logs -f
# Specific service
docker compose logs -f prometheus
docker compose logs -f grafana# All services
docker compose restart
# Specific service
docker compose restart prometheus# Stop services
docker compose down
# Stop and remove data volumes (caution!)
docker compose down -vdocker compose pull
docker compose up -d- Prometheus data: Stored in Docker volume
prometheus-data(60 days retention, 30GB max) - Grafana data: Stored in Docker volume
grafana-data(dashboards, datasources, settings)
To backup:
# Backup Prometheus
docker run --rm -v monitoring_prometheus-data:/data -v $(pwd):/backup alpine tar czf /backup/prometheus-backup.tar.gz /data
# Backup Grafana
docker run --rm -v monitoring_grafana-data:/data -v $(pwd):/backup alpine tar czf /backup/grafana-backup.tar.gz /dataThe monitoring stack is fully customized with Quantus branding:
- Custom Logo: Quantus logo replaces default Grafana branding
- Custom Favicon: Quantus icon appears in browser tabs
- App Title: "Quantus Monitoring" instead of "Grafana"
- Login Subtitle: "Blockchain Network Monitoring"
The dashboards use Quantus color scheme:
- Blue (
#0000ff,#1f1fa3) - Healthy/OK state - Pink (
#ed4cce) - Warning state - Yellow (
#ffe91f) - Critical state - Dark Background (
#0c1014) - Main background
Last Block Time (seconds):
- π΅ Blue (< 3 min) - Normal block production
- π©· Pink (3-10 min) - Slow block production
- π Yellow (> 10 min) - Critical delay
Uptime (percentage over 30 days):
- π΅ Blue (> 90%) - Excellent availability
- π©· Pink (50-90%) - Degraded service
- π Yellow (< 50%) - Critical downtime
All branding assets are located in grafana/branding/:
grafana/branding/
βββ logo.svg # Quantus logo (SVG)
βββ logo.png # Quantus logo (PNG)
βββ favicon.ico # Browser faviconTo customize:
- Replace files in
grafana/branding/with your own - Restart Grafana:
docker compose restart grafana - Hard refresh browser (Ctrl+Shift+R / Cmd+Shift+R)
Branding configuration is in docker-compose.yml under Grafana environment variables (GF_BRANDING_*).
monitoring/
βββ docker-compose.yml # Main configuration
βββ prometheus/
β βββ prometheus.yml # Prometheus scrape configs
βββ nginx/
β βββ nginx.conf # Nginx reverse proxy config
β βββ Dockerfile # Custom nginx image with htpasswd
β βββ docker-entrypoint.sh # Auth generation script
βββ grafana/
β βββ dashboards/ # Pre-loaded dashboards (by network)
β β βββ general/ # Welcome/overview dashboard
β β βββ system/ # System monitoring dashboards
β β βββ heisenberg/
β β βββ dirac/
β βββ branding/ # Quantus branding assets
β β βββ logo.svg # Quantus logo (SVG)
β β βββ logo.png # Quantus logo (PNG)
β β βββ favicon.ico # Browser favicon
β βββ provisioning/ # Auto-configuration
β βββ datasources/ # Prometheus datasource
β βββ dashboards/ # Dashboard providers
β βββ alerting/ # Alert configuration (provisioning)
β βββ rules.yml # Alert rules
β βββ contactpoints.yml # Contact points (email, etc.)
β βββ policies.yml # Notification policies
βββ .env.example # Environment variables template
βββ .gitignore
βββ README.md
The stack comes with pre-configured dashboards organized by network:
Welcome Dashboard - First page you see when opening Grafana:
- Chain Height for all 3 networks
- Last Block Time (in seconds, color-coded)
- 30-day Uptime percentage (color-coded)
- Support Services Status - Quick status of telemetry infrastructure
- Telemetry Host availability
- Connected nodes count
- Visible without login
- Auto-refreshes every 10 seconds
Color indicators:
- π΅ Blue = Healthy
- π©· Pink = Warning
- π Yellow = Critical
Located in the System folder:
Localhost Monitoring - Docker host system metrics:
- CPU Usage (current & over time)
- Memory Usage (current & over time)
- Disk Usage
- System Load (1m, 5m, 15m averages)
- Network I/O (receive/transmit)
- Disk I/O (read/write)
- System Uptime
Telemetry Monitoring - Telemetry infrastructure monitoring:
- Host Metrics (VPS): CPU, memory, disk usage, network I/O, system load
- Backend Metrics: Connected feeds/nodes/shards, message rates, dropped messages
- Real-time status of telemetry collection infrastructure
Both use Quantus color scheme with dynamic thresholds.
- Node Metrics - System resources, peers, network I/O
- TXPool - Transaction pool statistics
- Business Metrics - Block times, difficulty, chain height
Each dashboard shows:
- Block height (best & finalized)
- Connected peers
- Memory & CPU usage
- Network traffic
- Mining/validation metrics
Perfect for monitoring Substrate-based validators and full nodes.
Edit docker-compose.yml:
services:
prometheus:
command:
- '--storage.tsdb.retention.time=90d' # Change retention period
- '--storage.tsdb.retention.size=50GB' # Change max sizeBy default, services are accessible from localhost. To expose on your network, edit docker-compose.yml:
ports:
- "0.0.0.0:3000:3000" # Instead of "3000:3000"- Check target status: http://localhost:9091/targets (use Basic Auth)
- Verify target is accessible from Prometheus container
- Check Prometheus logs:
docker compose logs prometheus
This means rate limiting is too strict. Current settings allow 30 requests/second (burst 50), which should be enough. If you still see errors:
- Check nginx logs:
docker compose logs nginx - Adjust rate limits in
nginx/nginx.confif needed - Restart nginx:
docker compose restart nginx
Prometheus is protected with Basic Auth. Use credentials from .env:
# Default credentials
Username: admin
Password: prometheus
# Or check your .env file
cat .env | grep PROMETHEUS- Verify Prometheus datasource: Grafana β Configuration β Data Sources
- Check if Prometheus is scraping: http://localhost:9091/targets (use Basic Auth)
- Adjust time range in dashboard
On Linux, add to each service in docker-compose.yml:
extra_hosts:
- "host.docker.internal:host-gateway"If Node Exporter can't read system metrics, ensure proper volume mounts:
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/host:roThis stack includes built-in security (Nginx + Basic Auth + Rate Limiting). For production:
- β
Prometheus Basic Auth - Already configured (change credentials in
.env) - β Rate Limiting - 30 req/sec, prevents bruteforce attacks
β οΈ Strong Credentials - Generate secure passwords:PROMETHEUS_USER=monitoring_$(openssl rand -hex 8) PROMETHEUS_PASSWORD=$(openssl rand -base64 32)
β οΈ SSL/TLS - Use Cloudflare Tunnel or reverse proxy (Caddy, Traefik)β οΈ Firewall - Restrict ports or use VPN
# Prometheus is already secured with Basic Auth
# Add Cloudflare Tunnel for SSL + DDoS protection
# See: https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/
# Your monitoring stays private, Cloudflare handles SSL- Increase retention if needed: Edit
docker-compose.ymlstorage settings - Setup backups for Docker volumes
- Monitor the monitoring - Set up alerting for stack availability
- Regular updates:
docker compose pull && docker compose up -d
# 1. Edit .env
nano .env # Change PROMETHEUS_USER and PROMETHEUS_PASSWORD
# 2. Restart nginx (generates new htpasswd)
docker compose restart nginx
# 3. Verify
curl -u newuser:newpass http://localhost:9091/Internet β Cloudflare (SSL/DDoS) β Nginx (Auth/Rate Limit) β Prometheus
Defense in Depth: Basic Auth + Rate Limiting + Cloudflare = Enterprise-grade security
- Docker
- Docker Compose
- 2GB+ RAM recommended
- ~30GB disk space for default retention settings
- Substrate
- Polkadot
- Kusama
- Any Substrate-based parachain
- Generic Prometheus metrics
See LICENSE file for details.
Issues and pull requests welcome!
For Substrate/Polkadot metrics documentation: