
Connectivity Troubleshooting
Diagnose and fix network connectivity issues.
Diagnostic Tools
Check Agent Health
# Basic health
curl http://localhost:8080/health
# Detailed status
curl http://localhost:8080/healthz | jq
# Expected output:
{
"status": "healthy",
"agent_id": "abc123...",
"peers": 2,
"routes": 5,
"streams": 10
}
Check Peer Connections
# List connected peers
curl http://localhost:8080/agents | jq
Check Routes
# View routing table
curl http://localhost:8080/healthz | jq '.routes'
# Trigger route refresh
curl -X POST http://localhost:8080/routes/advertise
Peer Connection Issues
Can't Connect to Peer
Step 1: Check network reachability
# TCP (for HTTP/2, WebSocket)
nc -zv peer-address 4433
telnet peer-address 4433
# UDP (for QUIC)
nc -zvu peer-address 4433
Step 2: Check DNS resolution
dig peer-hostname
nslookup peer-hostname
Step 3: Check firewall
# On peer host
sudo iptables -L -n | grep 4433
sudo ufw status
# Try from another host on same network
curl http://peer-address:8080/health
Step 4: Check TLS
# Test TLS connection
openssl s_client -connect peer-address:4433 -CAfile ca.crt
# Verify certificate
muti-metroo cert info ./certs/agent.crt
Peer Disconnects Frequently
Check keepalive settings:
connections:
idle_threshold: 30s # Send keepalive after 30s idle
timeout: 90s # Disconnect after 90s no response
If network is slow, increase timeout:
connections:
idle_threshold: 60s
timeout: 180s
Check logs for disconnect reasons:
journalctl -u muti-metroo | grep -i "disconnect\|timeout"
Slow Reconnection
Tune reconnection backoff:
connections:
reconnect:
initial_delay: 500ms # Start faster
max_delay: 30s # Cap sooner
multiplier: 1.5 # Slower backoff
jitter: 0.3 # More randomization
Transport-Specific Issues
QUIC Not Working
QUIC uses UDP, which may be blocked or throttled.
Test UDP connectivity:
# From client
echo "test" | nc -u peer-address 4433
# On server, check if receiving
tcpdump -i any udp port 4433
Common issues:
- Corporate firewalls block UDP
- NAT devices timeout UDP quickly
- Some ISPs throttle UDP
Solution: Fall back to HTTP/2 or WebSocket:
peers:
- id: "..."
transport: h2 # Instead of quic
address: "peer-address:443"
HTTP/2 Not Working
Test HTTP/2:
curl -v --http2 https://peer-address:8443/mesh
Check TLS:
openssl s_client -connect peer-address:8443 -alpn h2
WebSocket Through Proxy
Test proxy connectivity:
# Test CONNECT through proxy
curl -v --proxy http://proxy:8080 https://peer-address:443/
# Check if proxy allows WebSocket upgrade
curl -v --proxy http://proxy:8080 \
-H "Upgrade: websocket" \
-H "Connection: Upgrade" \
https://peer-address:443/mesh
Configure proxy authentication:
peers:
- transport: ws
address: "wss://peer-address:443/mesh"
proxy: "http://proxy:8080"
proxy_auth:
username: "${PROXY_USER}"
password: "${PROXY_PASS}"
Routing Issues
No Route Found
Error: no route to 10.0.0.5
Step 1: Check if route should exist
# On exit agent
grep -A5 "exit:" /etc/muti-metroo/config.yaml
Step 2: Check route propagation
# On ingress agent
curl http://localhost:8080/healthz | jq '.routes'
Step 3: Check peer connectivity
Routes propagate through peers. If peer is disconnected, routes are lost.
curl http://localhost:8080/healthz | jq '.peers'
Step 4: Trigger route advertisement
curl -X POST http://exit-agent:8080/routes/advertise
Step 5: Wait for propagation
Routes take time to propagate (up to advertise_interval).
Route Expired
Routes expire after route_ttl without refresh.
# Check route TTL
grep route_ttl config.yaml
# If exit disconnected for too long, routes expire
# Reconnect exit and trigger advertisement
Wrong Route Selected
Routes are selected by:
- Longest prefix match
- Lowest metric (hop count) if tied
Debug route selection:
# Enable debug logging
muti-metroo run --log-level debug
# Look for route lookup logs
grep "route lookup" logs
Stream Issues
Streams Not Opening
Error: stream open timeout
Causes:
- Network latency too high
- Too many hops
- Exit agent overloaded
Solutions:
-
Increase timeout:
limits:
stream_open_timeout: 60s -
Check each hop is responsive
-
Reduce hop count if possible
Streams Dying
Check stream metrics:
curl http://localhost:8080/metrics | grep stream
Common causes:
- Idle timeout
- Buffer exhaustion
- Network issues
Network Diagnostics
Capture Traffic
# QUIC (UDP)
tcpdump -i any udp port 4433 -w capture.pcap
# HTTP/2, WebSocket (TCP)
tcpdump -i any tcp port 443 -w capture.pcap
Monitor Connections
# Watch connection states
watch -n 1 'netstat -an | grep 4433'
# Count connections
netstat -an | grep 4433 | wc -l
Latency Testing
# Measure round-trip time
ping peer-address
# Measure TCP latency
hping3 -S -p 443 peer-address
# Time a stream open
time curl -x socks5://localhost:1080 https://example.com -o /dev/null
Checklist
- Network reachable (ping, nc, telnet)
- Firewall allows traffic
- DNS resolves correctly
- TLS certificates valid
- Peer ID matches
- Routes advertised
- Logs show no errors