Prometheus Monitoring: SRE Best Practices and Implementation

Effective Metric Collection Key Metric Types Counter Metrics # Example counter metric http_requests_total{status="200", handler="/api/v1"} Gauge Metrics # Memory usage example process_resident_memory_bytes PromQL Best Practices Rate Calculations # Request rate over 5 minutes rate(http_requests_total[5m]) # Error rate percentage sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 Alert Configuration Alert Rules Example groups: - name: example rules: - alert: HighErrorRate expr: | sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100 > 5 for: 5m labels: severity: critical annotations: summary: High HTTP error rate description: "Error rate is {{ $value }}%" Recording Rules groups: - name: example rules: - record: job:http_inprogress_requests:sum expr: sum by (job) (http_inprogress_requests) Retention and Storage Storage Configuration global: scrape_interval: 15s evaluation_interval: 15s storage: tsdb: retention.time: 15d retention.size: 512GB Production Example apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: api-monitor spec: selector: matchLabels: app: api endpoints: - port: metrics interval: 30s path: /metrics - port: metrics interval: 10s path: /metrics/critical metricRelabelings: - sourceLabels: [__name__] regex: 'http_requests_total' action: keep

1 min · Me

Service Mesh Architecture: Implementation and Best Practices

Service Mesh Components Core Architecture Control Plane Service discovery Configuration management Certificate management Data Plane Traffic routing Load balancing Security enforcement Implementation Patterns Traffic Management apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: reviews-route spec: hosts: - reviews http: - match: - headers: end-user: exact: jason route: - destination: host: reviews subset: v2 - route: - destination: host: reviews subset: v3 Circuit Breaking apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: reviews-cb-policy spec: host: reviews trafficPolicy: outlierDetection: consecutive5xxErrors: 7 interval: 5m baseEjectionTime: 15m Security Patterns mTLS Configuration apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: default namespace: prod spec: mtls: mode: STRICT Observability Tracing Configuration apiVersion: telemetry.istio.io/v1alpha1 kind: Telemetry metadata: name: mesh-default spec: tracing: - randomSamplingPercentage: 50 customTags: env: literal: value: production Production Example apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: prod-gateway spec: selector: istio: ingressgateway servers: - port: number: 443 name: https protocol: HTTPS tls: mode: SIMPLE credentialName: prod-cert hosts: - "*.example.com" --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: prod-routes spec: hosts: - "*.example.com" gateways: - prod-gateway http: - match: - uri: prefix: /api/v1 route: - destination: host: api-service subset: v1 port: number: 80

1 min · Me