Skip to main content
Mole presenting metrics

Metrics and Monitoring

Muti Metroo exposes Prometheus metrics and provides comprehensive monitoring capabilities for production deployments.

Prometheus Metrics Endpoint

Metrics are exposed at the /metrics endpoint:

curl http://localhost:8080/metrics

All metrics are prefixed with muti_metroo_.

Connection Metrics

Monitor peer connectivity and connection health:

MetricTypeDescription
peers_connectedGaugeCurrently connected peers
peers_totalCounterTotal peer connections established
peer_connections_totalCounterConnections by transport and direction
peer_disconnects_totalCounterDisconnections by reason

Labels:

  • peer_connections_total: transport (quic, h2, ws), direction (inbound, outbound)
  • peer_disconnects_total: reason (timeout, error, shutdown)

Example Queries

# Current peer count
muti_metroo_peers_connected

# Connection rate by transport
rate(muti_metroo_peer_connections_total[5m])

# Disconnect rate
rate(muti_metroo_peer_disconnects_total[5m])

Stream Metrics

Monitor stream lifecycle and performance:

MetricTypeDescription
streams_activeGaugeCurrently active streams
streams_opened_totalCounterTotal streams opened
streams_closed_totalCounterTotal streams closed
stream_open_latency_secondsHistogramStream open latency
stream_errors_totalCounterStream errors by type

Labels:

  • stream_errors_total: error_type (timeout, rejected, reset)

Example Queries

# Active stream count
muti_metroo_streams_active

# Stream open rate
rate(muti_metroo_streams_opened_total[5m])

# P99 stream open latency
histogram_quantile(0.99, rate(muti_metroo_stream_open_latency_seconds_bucket[5m]))

# Error rate by type
rate(muti_metroo_stream_errors_total[5m])

Data Transfer Metrics

Monitor throughput and data volume:

MetricTypeDescription
bytes_sent_totalCounterBytes sent by type
bytes_received_totalCounterBytes received by type
frames_sent_totalCounterFrames sent by type
frames_received_totalCounterFrames received by type

Labels:

  • bytes_*: type (data, control)
  • frames_*: frame_type (stream_data, stream_open, keepalive, etc.)

Example Queries

# Data throughput (bytes/sec)
rate(muti_metroo_bytes_sent_total{type="data"}[5m])

# Frame rate by type
rate(muti_metroo_frames_sent_total[5m])

Routing Metrics

Monitor route table and propagation:

MetricTypeDescription
routes_totalGaugeRoutes in routing table
route_advertises_totalCounterRoute advertisements processed
route_withdrawals_totalCounterRoute withdrawals processed
route_flood_latency_secondsHistogramRoute flood propagation latency

Example Queries

# Route table size
muti_metroo_routes_total

# Route advertisement rate
rate(muti_metroo_route_advertises_total[5m])

# Route propagation latency
histogram_quantile(0.95, rate(muti_metroo_route_flood_latency_seconds_bucket[5m]))

SOCKS5 Metrics

Monitor proxy server performance:

MetricTypeDescription
socks5_connections_activeGaugeActive SOCKS5 connections
socks5_connections_totalCounterTotal SOCKS5 connections
socks5_auth_failures_totalCounterAuthentication failures
socks5_connect_latency_secondsHistogramConnect request latency

Example Queries

# Active SOCKS5 connections
muti_metroo_socks5_connections_active

# Auth failure rate
rate(muti_metroo_socks5_auth_failures_total[5m])

# P95 connect latency
histogram_quantile(0.95, rate(muti_metroo_socks5_connect_latency_seconds_bucket[5m]))

Exit Handler Metrics

Monitor exit node operations:

MetricTypeDescription
exit_connections_activeGaugeActive exit connections
exit_connections_totalCounterTotal exit connections
exit_dns_queries_totalCounterDNS queries performed
exit_dns_latency_secondsHistogramDNS query latency
exit_errors_totalCounterExit errors by type

Labels:

  • exit_errors_total: error_type (dial_failed, dns_failed, timeout)

Example Queries

# Exit connection count
muti_metroo_exit_connections_active

# DNS query rate
rate(muti_metroo_exit_dns_queries_total[5m])

# DNS latency
histogram_quantile(0.99, rate(muti_metroo_exit_dns_latency_seconds_bucket[5m]))

Protocol Metrics

Monitor protocol-level operations:

MetricTypeDescription
handshake_latency_secondsHistogramPeer handshake latency
handshake_errors_totalCounterHandshake errors by type
keepalives_sent_totalCounterKeepalives sent
keepalives_received_totalCounterKeepalives received
keepalive_rtt_secondsHistogramKeepalive round-trip time

Example Queries

# Handshake latency
histogram_quantile(0.99, rate(muti_metroo_handshake_latency_seconds_bucket[5m]))

# Keepalive RTT
histogram_quantile(0.95, rate(muti_metroo_keepalive_rtt_seconds_bucket[5m]))

RPC Metrics

Monitor remote command execution:

MetricTypeDescription
rpc_calls_totalCounterRPC calls by result and command
rpc_call_duration_secondsHistogramRPC call duration
rpc_bytes_received_totalCounterBytes received in requests
rpc_bytes_sent_totalCounterBytes sent in responses

Labels:

  • rpc_calls_total: result (success, failed, rejected, auth_failed, error), command
  • rpc_call_duration_seconds: command

Alerting Examples

Example Prometheus alert rules:

groups:
- name: muti-metroo
rules:
# No connected peers
- alert: NoPeersConnected
expr: muti_metroo_peers_connected == 0
for: 5m
labels:
severity: critical
annotations:
summary: "No peers connected on {{ $labels.instance }}"

# High stream error rate
- alert: HighStreamErrors
expr: rate(muti_metroo_stream_errors_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High stream error rate on {{ $labels.instance }}"

# High SOCKS5 auth failures
- alert: HighAuthFailures
expr: rate(muti_metroo_socks5_auth_failures_total[5m]) > 10
for: 5m
labels:
severity: warning
annotations:
summary: "High auth failure rate - possible brute force"

# High latency
- alert: HighStreamLatency
expr: histogram_quantile(0.99, rate(muti_metroo_stream_open_latency_seconds_bucket[5m])) > 5
for: 5m
labels:
severity: warning
annotations:
summary: "High stream open latency on {{ $labels.instance }}"

# Route table empty
- alert: NoRoutes
expr: muti_metroo_routes_total == 0
for: 10m
labels:
severity: warning
annotations:
summary: "No routes in routing table on {{ $labels.instance }}"

Grafana Dashboard

Example Grafana dashboard JSON available in the repository.

Key Panels

  1. Overview: Peer count, stream count, route count
  2. Throughput: Bytes/sec sent and received
  3. Latency: Stream open, keepalive RTT, DNS
  4. Errors: Stream errors, auth failures, disconnects
  5. Resources: Memory usage, goroutine count

Dashboard Variables

  • $instance: Filter by Prometheus instance
  • $job: Filter by job name

Health Endpoints

In addition to Prometheus metrics:

# Basic health check
curl http://localhost:8080/health
# Returns: OK

# Detailed health with stats
curl http://localhost:8080/healthz | jq
# Returns JSON with peer count, stream count, etc.

# Kubernetes readiness
curl http://localhost:8080/ready
# Returns: OK when ready to serve traffic