While Kubernetes provides built-in scaling based on CPU and memory usage, real-world applications often need to scale based on business-specific metrics. Whether it’s database connections, queue length, or request latency, custom metrics scaling allows you to adapt your infrastructure to your application’s unique needs. Let’s explore how to implement this in a production environment.
Why Custom Metrics Scaling?
Traditional resource-based scaling (CPU/memory) often fails to capture the true load on your system. Consider these scenarios:
- A database service with many idle connections
- A queue processor that needs to scale based on queue depth
- An API service where response time is critical
In these cases, you need custom metrics scaling to truly match your infrastructure to demand.
Setting Up Custom Metrics Collection
Prometheus Adapter Configuration
The first step is configuring Prometheus Adapter to collect and expose your custom metrics to Kubernetes. Here’s how to set it up:
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-adapter-config
data:
config.yaml: |
rules:
- seriesQuery: 'pg_stat_activity_count{state="active"}'
resources:
overrides:
namespace:
resource: namespace
pod:
resource: pod
name:
matches: "pg_stat_activity_count"
as: "postgresql_active_connections"
metricsQuery: sum(pg_stat_activity_count{state="active"}) by (namespace)
This configuration:
- Collects PostgreSQL active connection metrics
- Maps them to Kubernetes namespaces and pods
- Exposes them as a custom metric named
postgresql_active_connections
Implementing the Metrics Exporter
To collect PostgreSQL metrics, we need to deploy a metrics exporter:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-exporter
spec:
template:
spec:
containers:
- name: postgres-exporter
image: wrouesnel/postgres_exporter:latest
env:
- name: DATA_SOURCE_NAME
value: "postgresql://postgres:password@postgres:5432/postgres?sslmode=disable"
ports:
- containerPort: 9187
name: metrics
The exporter connects to PostgreSQL and exposes metrics in Prometheus format, which can then be collected and used for scaling decisions.
Implementing Horizontal Pod Autoscaling
Basic Custom Metrics HPA
Here’s how to implement horizontal pod autoscaling based on custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-postgres-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: postgres-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: postgresql_active_connections
target:
type: AverageValue
averageValue: 100
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 300
Key features of this configuration:
- Scales based on average active connections per pod
- Conservative scale-down behavior to prevent thrashing
- Quick scale-up response for sudden load spikes
Queue-Based Scaling Example
For message queue-based applications, we can scale based on queue depth:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-consumer-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-consumer
minReplicas: 1
maxReplicas: 20
metrics:
- type: External
external:
metric:
name: rabbitmq_queue_messages
selector:
matchLabels:
queue: work-queue
target:
type: AverageValue
averageValue: 100
This configuration scales queue consumers based on the number of messages waiting in the queue.
Production-Ready Implementation
Let’s look at a complete production setup that combines multiple scaling metrics with monitoring and alerting:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: postgres-metrics
spec:
selector:
matchLabels:
app: postgres
endpoints:
- port: metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: database-alerts
spec:
groups:
- name: database.rules
rules:
- alert: HighConnectionCount
expr: postgresql_active_connections > 1000
for: 5m
labels:
severity: warning
annotations:
description: Database connection count is high
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: advanced-scaling
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: database-service
minReplicas: 3
maxReplicas: 15
metrics:
- type: Pods
pods:
metric:
name: postgresql_active_connections
target:
type: AverageValue
averageValue: 100
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: main-ingress
target:
type: Value
value: 1k
This production setup provides:
- Comprehensive Monitoring: Regular collection of metrics with ServiceMonitor
- Proactive Alerting: Alerts for high connection counts before they become critical
- Multi-Metric Scaling: Considers connections, CPU usage, and request rate
- High Availability: Minimum of 3 replicas for redundancy
Operational Considerations
When implementing custom metrics scaling, keep these factors in mind:
-
Metric Selection:
- Choose metrics that directly correlate with user experience
- Avoid noisy metrics that could cause unnecessary scaling
- Consider the cost implications of your scaling decisions
-
Scaling Behavior:
- Set appropriate stabilization windows to prevent thrashing
- Configure conservative scale-down behavior
- Test scaling behavior under various load patterns
-
Monitoring and Alerting:
- Monitor the scaling decisions being made
- Alert on unexpected scaling events
- Track the correlation between metrics and actual load
-
Cost Control:
- Set reasonable maximum replica limits
- Monitor resource utilization across scaled pods
- Consider implementing cost allocation by namespace
Conclusion
Custom metrics scaling in Kubernetes provides the flexibility to scale your applications based on metrics that truly matter to your business. While it requires more setup than basic CPU/memory scaling, the ability to scale based on application-specific metrics can significantly improve both performance and resource efficiency.
Start with a single, well-understood metric and gradually add complexity as needed. Monitor your scaling behavior closely and adjust thresholds based on real-world performance data.