外观
Prometheus监控
WuEasy Gateway 内置了 Prometheus 监控支持,可以监控网关的各种运行指标。
配置监控
在 config.yaml
文件中配置监控:
yaml
gateway:
monitor:
enabled: true # 是否启用监控
port: "9090" # 监控服务端口
path: "/metrics" # 监控路径
metrics:
http-enabled: true # 是否启用HTTP指标
upstream-enabled: true # 是否启用上游服务指标
filter-enabled: true # 是否启用过滤器指标
cache-enabled: true # 是否启用缓存指标
session-enabled: true # 是否启用会话指标
error-enabled: true # 是否启用错误指标
可用指标
HTTP请求指标
gateway_http_requests_total
- HTTP请求总数gateway_http_request_duration_seconds
- HTTP请求持续时间gateway_http_requests_in_flight
- 当前并发请求数gateway_http_request_size_bytes
- HTTP请求大小gateway_http_response_size_bytes
- HTTP响应大小
上游服务指标
gateway_upstream_requests_total
- 上游服务请求总数gateway_upstream_request_duration_seconds
- 上游服务请求持续时间
过滤器指标
gateway_filter_executions_total
- 过滤器执行次数gateway_filter_duration_seconds
- 过滤器执行持续时间
缓存指标
gateway_cache_hits_total
- 缓存命中次数gateway_cache_misses_total
- 缓存未命中次数
会话指标
gateway_active_sessions
- 当前活跃会话数
限流指标
gateway_rate_limiter_triggers_total
- 限流器触发次数
错误指标
gateway_errors_total
- 错误总数
访问监控指标
启动网关后,可以通过以下URL访问监控指标:
- 监控首页:
http://localhost:9090/
- Prometheus指标:
http://localhost:9090/metrics
- 健康检查:
http://localhost:9090/health
与Prometheus集成
1. 配置Prometheus
在 prometheus.yml
配置文件中添加网关监控目标:
yaml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'wueasy-gateway'
static_configs:
- targets: ['localhost:9090']
scrape_interval: 5s
metrics_path: /metrics
2. 启动Prometheus
bash
prometheus --config.file=prometheus.yml
3. 访问Prometheus
打开浏览器访问 http://localhost:9090
(Prometheus默认端口)
常用查询示例
请求速率
promql
# 每秒请求数
rate(gateway_http_requests_total[5m])
# 按状态码分组的请求速率
rate(gateway_http_requests_total[5m]) by (status)
响应时间
promql
# 平均响应时间
rate(gateway_http_request_duration_seconds_sum[5m]) / rate(gateway_http_request_duration_seconds_count[5m])
# 95%分位数响应时间
histogram_quantile(0.95, rate(gateway_http_request_duration_seconds_bucket[5m]))
错误率
promql
# 4xx错误率
rate(gateway_http_requests_total{status=~"4.."}[5m]) / rate(gateway_http_requests_total[5m])
# 5xx错误率
rate(gateway_http_requests_total{status=~"5.."}[5m]) / rate(gateway_http_requests_total[5m])
并发连接数
promql
# 当前并发请求数
gateway_http_requests_in_flight
与Grafana集成
1. 添加Prometheus数据源
在Grafana中添加Prometheus数据源,URL设置为:http://localhost:9090
2. 创建仪表板
可以创建包含以下面板的仪表板:
- 请求速率趋势图
- 响应时间分布图
- 错误率趋势图
- 并发连接数图表
- 上游服务性能图表
- 过滤器性能图表
3. 示例查询
promql
# 请求速率面板
sum(rate(gateway_http_requests_total[5m])) by (method, route)
# 响应时间面板
histogram_quantile(0.50, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))
# 错误率面板
sum(rate(gateway_http_requests_total{status=~"[45].."}[5m])) / sum(rate(gateway_http_requests_total[5m])) * 100
告警规则
可以配置以下告警规则:
yaml
groups:
- name: wueasy-gateway
rules:
- alert: HighErrorRate
expr: sum(rate(gateway_http_requests_total{status=~"[45].."}[5m])) / sum(rate(gateway_http_requests_total[5m])) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "Gateway error rate is high"
description: "Gateway error rate is {{ $value | humanizePercentage }}"
- alert: HighResponseTime
expr: histogram_quantile(0.95, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le)) > 1
for: 2m
labels:
severity: warning
annotations:
summary: "Gateway response time is high"
description: "Gateway 95th percentile response time is {{ $value }}s"
- alert: GatewayDown
expr: up{job="wueasy-gateway"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Gateway is down"
description: "Gateway has been down for more than 1 minute"
注意事项
- 监控会增加一定的性能开销,建议在生产环境中根据实际需求调整监控配置
- 监控端口应该与主服务端口分离,避免冲突
- 建议定期清理历史监控数据,避免存储空间不足
- 可以通过配置文件的
metrics
部分选择性启用需要的指标类型