While Kubernetes provides built-in scaling based on CPU and memory usage, real-world applications often need to scale based on business-specific metrics. Whether it’s database connections, queue length, or request latency, custom metrics scaling allows you to adapt your infrastructure to your application’s unique needs. Let’s explore how to implement this in a production environment.

Why Custom Metrics Scaling?

Traditional resource-based scaling (CPU/memory) often fails to capture the true load on your system. Consider these scenarios:

  1. A database service with many idle connections
  2. A queue processor that needs to scale based on queue depth
  3. An API service where response time is critical

In these cases, you need custom metrics scaling to truly match your infrastructure to demand.

Setting Up Custom Metrics Collection

Prometheus Adapter Configuration

The first step is configuring Prometheus Adapter to collect and expose your custom metrics to Kubernetes. Here’s how to set it up:

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-adapter-config
data:
  config.yaml: |
    rules:
    - seriesQuery: 'pg_stat_activity_count{state="active"}'
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: "pg_stat_activity_count"
        as: "postgresql_active_connections"
      metricsQuery: sum(pg_stat_activity_count{state="active"}) by (namespace)    

This configuration:

  • Collects PostgreSQL active connection metrics
  • Maps them to Kubernetes namespaces and pods
  • Exposes them as a custom metric named postgresql_active_connections

Implementing the Metrics Exporter

To collect PostgreSQL metrics, we need to deploy a metrics exporter:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-exporter
spec:
  template:
    spec:
      containers:
      - name: postgres-exporter
        image: wrouesnel/postgres_exporter:latest
        env:
        - name: DATA_SOURCE_NAME
          value: "postgresql://postgres:password@postgres:5432/postgres?sslmode=disable"
        ports:
        - containerPort: 9187
          name: metrics

The exporter connects to PostgreSQL and exposes metrics in Prometheus format, which can then be collected and used for scaling decisions.

Implementing Horizontal Pod Autoscaling

Basic Custom Metrics HPA

Here’s how to implement horizontal pod autoscaling based on custom metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-postgres-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: postgres-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: postgresql_active_connections
      target:
        type: AverageValue
        averageValue: 100
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 300

Key features of this configuration:

  • Scales based on average active connections per pod
  • Conservative scale-down behavior to prevent thrashing
  • Quick scale-up response for sudden load spikes

Queue-Based Scaling Example

For message queue-based applications, we can scale based on queue depth:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-consumer-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-consumer
  minReplicas: 1
  maxReplicas: 20
  metrics:
  - type: External
    external:
      metric:
        name: rabbitmq_queue_messages
        selector:
          matchLabels:
            queue: work-queue
      target:
        type: AverageValue
        averageValue: 100

This configuration scales queue consumers based on the number of messages waiting in the queue.

Production-Ready Implementation

Let’s look at a complete production setup that combines multiple scaling metrics with monitoring and alerting:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-metrics
spec:
  selector:
    matchLabels:
      app: postgres
  endpoints:
  - port: metrics
    interval: 15s

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: database-alerts
spec:
  groups:
  - name: database.rules
    rules:
    - alert: HighConnectionCount
      expr: postgresql_active_connections > 1000
      for: 5m
      labels:
        severity: warning
      annotations:
        description: Database connection count is high

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: advanced-scaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: database-service
  minReplicas: 3
  maxReplicas: 15
  metrics:
  - type: Pods
    pods:
      metric:
        name: postgresql_active_connections
      target:
        type: AverageValue
        averageValue: 100
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Object
    object:
      metric:
        name: requests-per-second
      describedObject:
        apiVersion: networking.k8s.io/v1
        kind: Ingress
        name: main-ingress
      target:
        type: Value
        value: 1k

This production setup provides:

  1. Comprehensive Monitoring: Regular collection of metrics with ServiceMonitor
  2. Proactive Alerting: Alerts for high connection counts before they become critical
  3. Multi-Metric Scaling: Considers connections, CPU usage, and request rate
  4. High Availability: Minimum of 3 replicas for redundancy

Operational Considerations

When implementing custom metrics scaling, keep these factors in mind:

  1. Metric Selection:

    • Choose metrics that directly correlate with user experience
    • Avoid noisy metrics that could cause unnecessary scaling
    • Consider the cost implications of your scaling decisions
  2. Scaling Behavior:

    • Set appropriate stabilization windows to prevent thrashing
    • Configure conservative scale-down behavior
    • Test scaling behavior under various load patterns
  3. Monitoring and Alerting:

    • Monitor the scaling decisions being made
    • Alert on unexpected scaling events
    • Track the correlation between metrics and actual load
  4. Cost Control:

    • Set reasonable maximum replica limits
    • Monitor resource utilization across scaled pods
    • Consider implementing cost allocation by namespace

Conclusion

Custom metrics scaling in Kubernetes provides the flexibility to scale your applications based on metrics that truly matter to your business. While it requires more setup than basic CPU/memory scaling, the ability to scale based on application-specific metrics can significantly improve both performance and resource efficiency.

Start with a single, well-understood metric and gradually add complexity as needed. Monitor your scaling behavior closely and adjust thresholds based on real-world performance data.