Effective Metric Collection

Key Metric Types

  1. Counter Metrics
# Example counter metric
http_requests_total{status="200", handler="/api/v1"}
  1. Gauge Metrics
# Memory usage example
process_resident_memory_bytes

PromQL Best Practices

Rate Calculations

# Request rate over 5 minutes
rate(http_requests_total[5m])

# Error rate percentage
sum(rate(http_requests_total{status=~"5.."}[5m])) 
  / 
sum(rate(http_requests_total[5m])) * 100

Alert Configuration

Alert Rules Example

groups:
- name: example
  rules:
  - alert: HighErrorRate
    expr: |
      sum(rate(http_requests_total{status=~"5.."}[5m])) 
      / 
      sum(rate(http_requests_total[5m])) * 100 > 5      
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: High HTTP error rate
      description: "Error rate is {{ $value }}%"

Recording Rules

groups:
- name: example
  rules:
  - record: job:http_inprogress_requests:sum
    expr: sum by (job) (http_inprogress_requests)

Retention and Storage

  1. Storage Configuration
global:
  scrape_interval: 15s
  evaluation_interval: 15s
storage:
  tsdb:
    retention.time: 15d
    retention.size: 512GB

Production Example

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api-monitor
spec:
  selector:
    matchLabels:
      app: api
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
  - port: metrics
    interval: 10s
    path: /metrics/critical
    metricRelabelings:
    - sourceLabels: [__name__]
      regex: 'http_requests_total'
      action: keep