Custom Metrics Scaling in Kubernetes

While Kubernetes provides built-in scaling based on CPU and memory usage, real-world applications often need to scale based on business-specific metrics. Whether it’s database connections, queue length, or request latency, custom metrics scaling allows you to adapt your infrastructure to your application’s unique needs. Let’s explore how to implement this in a production environment. Why Custom Metrics Scaling? Traditional resource-based scaling (CPU/memory) often fails to capture the true load on your system. Consider these scenarios: ...

4 min · Me

Prometheus Monitoring: SRE Best Practices and Implementation

Effective Metric Collection Key Metric Types Counter Metrics # Example counter metric http_requests_total{status="200", handler="/api/v1"} Gauge Metrics # Memory usage example process_resident_memory_bytes PromQL Best Practices Rate Calculations # Request rate over 5 minutes rate(http_requests_total[5m]) # Error rate percentage sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 Alert Configuration Alert Rules Example groups: - name: example rules: - alert: HighErrorRate expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 > 5 for: 5m labels: severity: critical annotations: summary: High HTTP error rate description: "Error rate is {{ $value }}%" Recording Rules groups: - name: example rules: - record: job:http_inprogress_requests:sum expr: sum by (job) (http_inprogress_requests) Retention and Storage Storage Configuration global: scrape_interval: 15s evaluation_interval: 15s storage: tsdb: retention.time: 15d retention.size: 512GB Production Example apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: api-monitor spec: selector: matchLabels: app: api endpoints: - port: metrics interval: 30s path: /metrics - port: metrics interval: 10s path: /metrics/critical metricRelabelings: - sourceLabels: [__name__] regex: 'http_requests_total' action: keep

1 min · Me