
Performance Troubleshooting
Diagnose and optimize performance issues.
Performance Metrics
Key Metrics to Monitor
# Get all metrics
curl http://localhost:8080/metrics
# Key performance metrics:
# - Stream latency
# - Throughput
# - Active connections
# - Buffer usage
| Metric | What It Tells You |
|---|---|
stream_open_latency_seconds | Time to establish streams |
bytes_sent_total | Data throughput |
streams_active | Current load |
keepalive_rtt_seconds | Network latency |
High Latency
Symptoms
- Slow page loads
- SSH feels sluggish
- High
stream_open_latency_seconds
Diagnosis
# Check stream open latency
curl http://localhost:8080/metrics | grep stream_open_latency
# Check keepalive RTT
curl http://localhost:8080/metrics | grep keepalive_rtt
# Count hops
curl http://localhost:8080/healthz | jq '.routes'
# Look at metric values - each increment is one hop
Solutions
1. Reduce hop count
Fewer hops = lower latency.
Before: A -> B -> C -> D -> E (4 hops)
After: A -> B -> E (2 hops)
2. Use faster transports
QUIC is generally faster than HTTP/2 or WebSocket.
3. Optimize network path
- Use geographically closer relays
- Avoid high-latency links
4. Tune keepalive
connections:
idle_threshold: 60s # Less frequent keepalives
Low Throughput
Symptoms
- Slow file transfers
- Low
bytes_sent_totalrate - Buffering on streams
Diagnosis
# Check throughput
curl http://localhost:8080/metrics | grep bytes
# Check for throttling
curl http://localhost:8080/metrics | grep stream
# Check buffer status (if available)
Solutions
1. Increase buffer size
limits:
buffer_size: 524288 # 512 KB (default 256 KB)
Larger buffers = better throughput, but more memory.
2. Check for bottlenecks
# Test network speed between hops
iperf3 -c peer-address -p 5201
# Check if CPU-bound
top -p $(pgrep muti-metroo)
3. Use QUIC
QUIC handles packet loss better than TCP-based transports.
4. Reduce frame overhead
Ensure frames are at or near max size (16 KB).
High Memory Usage
Symptoms
- Agent using excessive RAM
- OOM kills
- System slowdown
Diagnosis
# Check memory usage
ps aux | grep muti-metroo
cat /proc/$(pgrep muti-metroo)/status | grep -i mem
# Check stream count
curl http://localhost:8080/metrics | grep streams_active
Calculation
Memory per stream = buffer_size x number_of_hops
1000 streams x 256 KB buffer x 3 hops = 768 MB
Solutions
1. Reduce buffer size
limits:
buffer_size: 131072 # 128 KB
2. Limit concurrent streams
limits:
max_streams_per_peer: 500
max_streams_total: 2000
3. Reduce hop count
Fewer hops = less buffering per stream.
4. Add memory limits (container)
# docker-compose.yml
services:
agent:
deploy:
resources:
limits:
memory: 1G
High CPU Usage
Symptoms
- Agent using high CPU
- Slow response times
- System load high
Diagnosis
# Check CPU usage
top -p $(pgrep muti-metroo)
# CPU profiling
curl http://localhost:8080/debug/pprof/profile?seconds=30 > cpu.prof
go tool pprof cpu.prof
Solutions
1. Reduce logging
agent:
log_level: "warn" # Not debug or info
2. Limit stream count
More streams = more CPU.
limits:
max_streams_total: 5000
3. Use faster hardware
CPU-bound workloads benefit from faster cores.
Connection Issues
Too Many Connections
# Check connection count
netstat -an | grep 4433 | wc -l
curl http://localhost:8080/metrics | grep peers_connected
Solutions:
limits:
max_streams_per_peer: 500 # Limit per peer
Connection Churn
Frequent connect/disconnect wastes resources.
# Check reconnection rate
curl http://localhost:8080/metrics | grep peer_disconnects
Solutions:
- Increase timeouts
- Improve network stability
- Check for misbehaving peers
pprof Debugging
Muti Metroo exposes pprof for profiling:
# CPU profile
curl http://localhost:8080/debug/pprof/profile?seconds=30 > cpu.prof
go tool pprof cpu.prof
# Memory profile
curl http://localhost:8080/debug/pprof/heap > heap.prof
go tool pprof heap.prof
# Goroutine dump
curl http://localhost:8080/debug/pprof/goroutine?debug=2
# Block profile (where goroutines block)
curl http://localhost:8080/debug/pprof/block > block.prof
go tool pprof block.prof
Optimization Checklist
For Latency
- Minimize hop count
- Use QUIC transport
- Geographically optimize relay placement
- Check network latency between hops
For Throughput
- Increase buffer sizes
- Use QUIC transport
- Check for network bottlenecks
- Monitor for packet loss
For Memory
- Reduce buffer size if needed
- Limit stream counts
- Monitor active streams
- Set memory limits
For CPU
- Reduce logging verbosity
- Limit stream counts
- Profile with pprof
- Check for excessive reconnections
Performance Tuning Guide
Low Latency Priority
routing:
max_hops: 4 # Limit hops
connections:
idle_threshold: 60s # Less keepalive traffic
limits:
buffer_size: 131072 # 128 KB - smaller buffers
High Throughput Priority
limits:
buffer_size: 524288 # 512 KB - larger buffers
max_streams_per_peer: 2000
connections:
idle_threshold: 30s # Detect issues quickly
Memory Constrained
limits:
buffer_size: 65536 # 64 KB
max_streams_total: 1000
max_streams_per_peer: 100
Next Steps
- Protocol Limits - Limit reference
- Common Issues - Quick solutions
- Deployment - Optimize deployment