Prometheus监控

WuEasy Gateway 内置了 Prometheus 监控支持，可以监控网关的各种运行指标。

配置监控

在 config.yaml 文件中配置监控：

yaml

gateway:
  monitor:
    enabled: true          # 是否启用监控
    port: "9090"          # 监控服务端口
    path: "/metrics"      # 监控路径
    metrics:
      http-enabled: true        # 是否启用HTTP指标
      upstream-enabled: true    # 是否启用上游服务指标
      filter-enabled: true      # 是否启用过滤器指标
      cache-enabled: true       # 是否启用缓存指标
      session-enabled: true     # 是否启用会话指标
      error-enabled: true       # 是否启用错误指标

可用指标

HTTP请求指标

gateway_http_requests_total - HTTP请求总数
gateway_http_request_duration_seconds - HTTP请求持续时间
gateway_http_requests_in_flight - 当前并发请求数
gateway_http_request_size_bytes - HTTP请求大小
gateway_http_response_size_bytes - HTTP响应大小

上游服务指标

gateway_upstream_requests_total - 上游服务请求总数
gateway_upstream_request_duration_seconds - 上游服务请求持续时间

过滤器指标

gateway_filter_executions_total - 过滤器执行次数
gateway_filter_duration_seconds - 过滤器执行持续时间

缓存指标

gateway_cache_hits_total - 缓存命中次数
gateway_cache_misses_total - 缓存未命中次数

会话指标

gateway_active_sessions - 当前活跃会话数

限流指标

gateway_rate_limiter_triggers_total - 限流器触发次数

错误指标

gateway_errors_total - 错误总数

访问监控指标

启动网关后，可以通过以下URL访问监控指标：

监控首页：http://localhost:9090/
Prometheus指标：http://localhost:9090/metrics
健康检查：http://localhost:9090/health

与Prometheus集成

1. 配置Prometheus

在 prometheus.yml 配置文件中添加网关监控目标：

yaml

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'wueasy-gateway'
    static_configs:
      - targets: ['localhost:9090']
    scrape_interval: 5s
    metrics_path: /metrics

2. 启动Prometheus

bash

prometheus --config.file=prometheus.yml

3. 访问Prometheus

打开浏览器访问 http://localhost:9090（Prometheus默认端口）

常用查询示例

请求速率

promql

# 每秒请求数
rate(gateway_http_requests_total[5m])

# 按状态码分组的请求速率
rate(gateway_http_requests_total[5m]) by (status)

响应时间

promql

# 平均响应时间
rate(gateway_http_request_duration_seconds_sum[5m]) / rate(gateway_http_request_duration_seconds_count[5m])

# 95%分位数响应时间
histogram_quantile(0.95, rate(gateway_http_request_duration_seconds_bucket[5m]))

错误率

promql

# 4xx错误率
rate(gateway_http_requests_total{status=~"4.."}[5m]) / rate(gateway_http_requests_total[5m])

# 5xx错误率
rate(gateway_http_requests_total{status=~"5.."}[5m]) / rate(gateway_http_requests_total[5m])

并发连接数

promql

# 当前并发请求数
gateway_http_requests_in_flight

与Grafana集成

1. 添加Prometheus数据源

在Grafana中添加Prometheus数据源，URL设置为：http://localhost:9090

2. 创建仪表板

可以创建包含以下面板的仪表板：

请求速率趋势图
响应时间分布图
错误率趋势图
并发连接数图表
上游服务性能图表
过滤器性能图表

3. 示例查询

promql

# 请求速率面板
sum(rate(gateway_http_requests_total[5m])) by (method, route)

# 响应时间面板
histogram_quantile(0.50, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le))

# 错误率面板
sum(rate(gateway_http_requests_total{status=~"[45].."}[5m])) / sum(rate(gateway_http_requests_total[5m])) * 100

告警规则

可以配置以下告警规则：

yaml

groups:
  - name: wueasy-gateway
    rules:
      - alert: HighErrorRate
        expr: sum(rate(gateway_http_requests_total{status=~"[45].."}[5m])) / sum(rate(gateway_http_requests_total[5m])) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Gateway error rate is high"
          description: "Gateway error rate is {{ $value | humanizePercentage }}"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, sum(rate(gateway_http_request_duration_seconds_bucket[5m])) by (le)) > 1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Gateway response time is high"
          description: "Gateway 95th percentile response time is {{ $value }}s"

      - alert: GatewayDown
        expr: up{job="wueasy-gateway"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Gateway is down"
          description: "Gateway has been down for more than 1 minute"

注意事项

监控会增加一定的性能开销，建议在生产环境中根据实际需求调整监控配置
监控端口应该与主服务端口分离，避免冲突
建议定期清理历史监控数据，避免存储空间不足
可以通过配置文件的 metrics 部分选择性启用需要的指标类型

Prometheus监控 ​

配置监控 ​

可用指标 ​

HTTP请求指标 ​

上游服务指标 ​

过滤器指标 ​

缓存指标 ​

会话指标 ​

限流指标 ​

错误指标 ​

访问监控指标 ​

与Prometheus集成 ​

1. 配置Prometheus ​

2. 启动Prometheus ​

3. 访问Prometheus ​

常用查询示例 ​

请求速率 ​

响应时间 ​

错误率 ​

并发连接数 ​

与Grafana集成 ​

1. 添加Prometheus数据源 ​

2. 创建仪表板 ​

3. 示例查询 ​

告警规则 ​

注意事项 ​

Prometheus监控

配置监控

可用指标

HTTP请求指标

上游服务指标

过滤器指标

缓存指标

会话指标

限流指标

错误指标

访问监控指标

与Prometheus集成

1. 配置Prometheus

2. 启动Prometheus

3. 访问Prometheus

常用查询示例

请求速率

响应时间

错误率

并发连接数

与Grafana集成

1. 添加Prometheus数据源

2. 创建仪表板

3. 示例查询

告警规则

注意事项