Skip to main content

Command Palette

Search for a command to run...

Mastering KEDA on GKE: A Deep Dive into Event-Driven Autoscaling

Event Driven Scaling and How to Fix It When It Breaks

Updated
8 min read
Mastering KEDA on GKE: A Deep Dive into Event-Driven Autoscaling
S
DevOps Engineer with hands-on experience across GCP, AWS, and Azure. I thrive in complex systems, solving tough problems, automating workflows, and creating efficient, scalable solutions. Known for digging deep to understand the “why” and building reliable, elegant systems. Passionate about continuous learning, multi-cloud solutions, and turning challenges into streamlined results.

If you are running workloads on Google Kubernetes Engine (GKE), you are likely familiar with the Horizontal Pod Autoscaler (HPA). HPA is great for scaling based on standard CPU or memory metrics.

But as your architecture matures, standard metrics often aren't enough. You need to scale based on business realities—and that is where KEDA (Kubernetes Event-driven Autoscaling) comes into the picture.

Why KEDA? (When GKE Native Scaling Isn't Enough)

If you have worked extensively with GKE, you might be thinking: Doesn't GKE already support scaling on Pub/Sub or Cloud Tasks via the Custom Metrics Stackdriver Adapter?

Yes, it does. If your entire architecture lives perfectly within Google Cloud's walled garden and all your metrics are piped into Cloud Monitoring, GKE's native custom metrics adapter works fine.

However, real-world microservices rarely stay in that box. GKE's native scaling hits a hard wall when you need to scale based on metrics not controlled by Google Cloud. For example:

  • Scaling based on the number of active WebSocket connections exposed directly by a pod's internal metrics.

  • Scaling based on a raw SQL query evaluating active user sessions in a Postgres database.

  • Scaling off a legacy RabbitMQ or Kafka cluster hosted outside of GCP.

  • Scaling a video encoding API service: If the backlog of concurrent encoding tasks in your queue exceeds 4-5 at any given time, KEDA can instantly spin up more heavy-compute worker pods to process the videos, and then scale them back down to zero when the queue is empty.

In these scenarios, KEDA acts as a universal translator. It pulls external metrics from dozens of different sources and feeds them directly to the HPA, allowing you to even scale workloads completely to zero.

The Prerequisites and The "Clash": KEDA vs. Prometheus Adapter

To use KEDA effectively on GKE, you need a metrics source. For many teams, the prerequisite is having Prometheus installed to scrape application-level metrics.

Historically, to scale on Prometheus metrics, you would install the prometheus-adapter. Do not do this if you plan to use KEDA.

Kubernetes has specific API endpoints for autoscaling:

Kubernetes only allows one active service to own these endpoints at a time. If you install prometheus-adapter (or the Datadog Cluster Agent) alongside KEDA, they will violently clash.

The Solution: Let KEDA override and completely replace your prometheus-adapter. Because KEDA has a built-in Prometheus scaler, it acts as a unified external metrics server. KEDA takes over the external metrics API and handles your Prometheus metrics, your Pub/Sub queues, and your database queries all under one roof( i.e. external.metrics.k8s.io must override by keda/keda-operator-metrics-apiserver ).

How to Integrate KEDA in GKE

Deploying KEDA is straightforward via Helm, but requires setting up the right permissions if you are accessing cloud-secured resources (like Google Cloud Pub/Sub or Cloud Tasks). You should never use hardcoded service account keys. Instead, leverage GKE Workload Identity.

Step 1: Install KEDA Add the repository and install it into a dedicated namespace:

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

Step 2: Configure GCP Workload Identity (The Secure Way) To allow KEDA to read external GCP metrics, you need to bind a Google Service Account (GSA) that has the necessary IAM permissions (like roles/pubsub.viewer) to a Kubernetes Service Account (KSA).

#1.Create a KSA in your applications namespace
kubectl create serviceaccount keda-app-sa -n backend-apps

#2.Annotate the KSA with your Google Service Account email
kubectl annotate serviceaccount keda-app-sa \
    iam.gke.io/gcp-service-account=<YOUR_GCP_SERVICE_ACCOUNT_EMAIL> \
    -n backend-apps

#3.Bind the GSA to the KSA via IAM policy
gcloud iam service-accounts add-iam-policy-binding <YOUR_GCP_SERVICE_ACCOUNT_EMAIL> \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:<YOUR_PROJECT_ID>.svc.id.goog[backend-apps/keda-app-sa]"

Step 3: Create a TriggerAuthentication This Custom Resource Definition (CRD) decouples your credentials from your scaling logic. It tells KEDA to use the GCP Workload Identity provider we just set up.

apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: gcp-workload-identity-auth
  namespace: backend-apps
spec:
  podIdentity:
    provider: gcp

If you are not using Workload Identity on GKE, Create a gcp-sa-key.json file of Service account & add it as secret.

kubectl create secret generic gcp-credentials-secret \ --from-file=GOOGLE_APPLICATION_CREDENTIALS_JSON=gcp-sa-key.json \ -n backend-apps
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: gcp-secret-auth
  namespace: backend-apps
spec:
  secretTargetRef:
    - parameter: GoogleApplicationCredentials 
      name: gcp-credentials-secret
      key: GOOGLE_APPLICATION_CREDENTIALS_JSON

Step 4: Create a ScaledObject The ScaledObject tells KEDA what to scale and which metric to watch.

If you are scaling on an internal cluster metric like Prometheus, you define the query directly:

1. You have to Expose that metric in prometheus from your code
2. Query that metrics using ScaledObject from you prometheus server.

const client = require('prom-client');

/* ---------- Prometheus ---------- */
const register = new client.Registry();
client.collectDefaultMetrics({ register });
const activeWsConnections = new client.Gauge({
  name: 'websocket_connections',
  help: 'Current number of active WebSocket connections'
});
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: websocket-scaler
  namespace: backend-apps
spec:
  scaleTargetRef:
    name: websocket-server-deployment
  minReplicaCount: 1
  maxReplicaCount: 50
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus-server.monitoring.svc.cluster.local:9090
      metricName: websocket_connections
      threshold: '100'
      query: sum(websocket_connections{namespace="backend-apps"})

If you are scaling on a secure external metric like GCP Pub/Sub, you reference the TriggerAuthentication from Step 3 so KEDA can access the queue:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: pubsub-scaler
  namespace: backend-apps
spec:
  scaleTargetRef:
    name: worker-deployment
  minReplicaCount: 1  
  maxReplicaCount: 20
  triggers:
  - type: gcp-stackdriver
    metadata:
      projectId: <YOUR_PROJECT_ID>
      metricName: pubsub.googleapis.com/subscription/num_undelivered_messages
      targetValue: '5'
      filter: metric.labels.subscription_id="my-task-subscription"
    authenticationRef:
      name: gcp-workload-identity-auth

Troubleshooting: Why Isn't KEDA Working?

When KEDA fails, it fails silently. If your pods aren't scaling, use these experienced-backed steps to identify the root cause.

1. The External Metrics API Clash

The Problem: Your HPAs are throwing errors about unable to fetch metrics, or you notice erratic scaling because KEDA is fighting another adapter. The Fix: Check which service is actively owning the external metrics API.

kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml | grep -A 5 "service:"

If the output shows anything other than keda/keda-operator-metrics-apiserver, you have a clash. You must uninstall the competing adapter (like prometheus-adapter) to let KEDA take control.

2. Conflicting HPA Configurations

The Problem: You manually created an HPA, and then created a KEDA ScaledObject for the same deployment. The deployment scales up and immediately kills the pods. The Fix: KEDA dynamically creates and manages its own HPA under the hood. Find and delete the manual one.

# Find all HPAs targeting your deployment
kubectl get hpa -n <your-namespace>

# Delete the rogue HPA that wasn't created by KEDA
kubectl delete hpa <rogue-hpa-name> -n <your-namespace>

3. Authentication and Permission Denied Errors

The Problem: Your ScaledObject is active, but KEDA cannot read the target queue or GCP service due to IAM issues. The Fix: Tail the logs of the KEDA operator to look for 403 Forbidden or Unauthorized errors.

kubectl logs -l app=keda-operator -n keda --tail=100 | grep -i error

If using GKE Workload Identity, verify the binding between the Google Service Account and the KEDA Kubernetes Service Account:

gcloud iam service-accounts get-iam-policy <GOOGLE_SERVICE_ACCOUNT_EMAIL> \
    --flatten="bindings[].members" \
    --format='table(bindings.role)' \
    --filter="bindings.members:serviceAccount:<PROJECT_ID>.svc.id.goog[keda/keda-operator]"

4. Silent Failures in ScaledObject Configuration

The Problem: You applied the YAML, but absolutely nothing happens. The HPA isn't even created. The Fix: Check the actual status and events of your ScaledObject. KEDA will report configuration errors (like misspelled deployment names or unreachable Prometheus servers) here.

kubectl describe scaledobject <scaledobject-name> -n <your-namespace>

Look at the Conditions section at the bottom of the output. You want to see Ready: True.

5. Impatience with the Cooldown Period

The Problem: The queue is empty, the WebSocket connections are at zero, but KEDA is still keeping your pods running. The Fix: By default, KEDA waits 5 minutes (300 seconds) before scaling down to zero to prevent rapid pod "flapping." If you are testing and want to speed this up, patch the ScaledObject to reduce the cooldown:

kubectl patch scaledobject <scaledobject-name> -n <your-namespace> \
  --type='merge' \
  -p '{"spec":{"cooldownPeriod": 30}}'

Conclusion: Embracing the Event-Driven Future

Moving beyond standard CPU and memory metrics is a necessary step in maturing your Kubernetes architecture. While GKE's native tools are a great starting point, KEDA unlocks the true potential of event-driven microservices. It lets you scale based on the metrics that actually matter to your application's performance and your business's bottom line—whether that is queue length, active WebSocket connections, or external database queries.

Yes, introducing KEDA means navigating potential API clashes, replacing legacy adapters, and keeping a very close eye on your HPA configurations. But once you have your ScaledObjects dialed in and your Workload Identity secure, the ability to proactively scale—and gracefully scale to zero to save costs—makes the initial learning curve entirely worth it.

This is just the first deep dive in my ongoing Kubernetes series here on Hashnode. We will keep exploring the advanced tools and tactics needed to build truly resilient and efficient clusters.

kubernetes

Part 2 of 2

Documenting my deep dives into Kubernetes. No fluff, just the setups, fixes, and 'aha' moments I hit while working with clusters

Start from the beginning

Why Your Kubernetes Pods Aren't Scaling Down to 1: Cracking the HPA Algorithm

The HPA Algorithm