Skip to main content

Command Palette

Search for a command to run...

The "No" Button: Stopping Bad Configs Before They Wake You Up at 3 AM

Updated
8 min read
The "No" Button: Stopping Bad Configs Before They Wake You Up at 3 AM
M

DevOps Engineer by day, YAML debugger by night. I help turn “it works on my machine” into “it works in production” by automating infrastructure, building CI/CD pipelines, and keeping cloud systems happy and scalable. I enjoy breaking things safely (in staging), fixing them properly (in prod), and writing about real-world DevOps lessons—what worked, what didn’t, and what I wish I knew earlier. If it involves Docker, Kubernetes, or reducing pager alerts, I’m probably interested.

You get a Slack message at 8 PM. A critical production service is down. After an hour of frantic debugging, you find the cause: a developer deployed a new version without resource limits, and the pod went rogue, triggering a cascading failure. We've all been there. Managing multiple Kubernetes clusters and development teams can feel like a constant battle against inconsistency and overlooked best practices.

In the previous two articles in this series, we covered the foundations of policy-as-code and explored the magic of mutation policies for automatically correcting configurations. Now, we turn our attention to the most powerful tool in the policy engine arsenal: Validation Policies. This article will focus on the most impactful and perhaps counter-intuitive uses of Validation Policies with Kyverno, presenting them not as a tool for enforcement, but for empowerment.

Guardrails, not Gates: Helping Devs Without Being a Blocker

The most effective policy strategy isn't about building gates that simply block developers when they do something wrong. It's about creating "guardrails" that gently guide them toward best practices, making the right way the easy way.

This philosophy is where Kyverno’s simplicity shines. Unlike tools that require learning a specialized language like Rego, Kyverno uses simple, declarative YAML for policy definitions. This approachability makes it easier for platform and development teams to collaborate on policy creation and maintenance. Instead of a complex, opaque language owned by a central team, policies become shared artifacts that everyone can understand and contribute to.

A great analogy for this is a validated drop-down list in a web form. The dropdown is a preventive control; it stops you from entering bad data from the very start. But it does so in a helpful, intuitive way by showing you the valid options. A good Kyverno policy works the same way. It prevents a misconfigured resource from ever entering the cluster, but when paired with clear error messages, it teaches the developer what needs to be fixed.

This turns policy management from a confrontational process into a collaborative one, improving both security posture and developer velocity. This isn't just about developer happiness; it's about shipping more secure code, faster, by shrinking the feedback loop from days (in a security review) to seconds (at kubectl apply).

The "Root" of all Evil: Why We Block Root Users

One of the most fundamental Kubernetes security best practices is to prevent containers from running as the root user. Allowing root access inside a container is a major security risk, opening the door to privilege escalation attacks if a vulnerability is exploited. Enforcing this rule manually across hundreds or thousands of workloads is impossible.

This is a perfect use case for a preventive validation policy. With a few lines of YAML, you can create a cluster-wide rule that automatically blocks any pod that attempts to run as root.

Here is a simple Kyverno ClusterPolicy that uses a Common Expression Language (CEL) expression to validate that pods are configured to run as a non-root user:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-root-user
spec:
  validationFailureAction: Enforce
  rules:
    - name: validate-run-as-non-root
      match:
        resources:
          kinds:
            - Pod
      validate:
        cel:
          expression: "object.spec.securityContext.runAsNonRoot == true"

This simple, declarative rule is incredibly powerful. It enforces a critical security standard across the entire cluster without any manual intervention. It's a classic example of a powerful preventive control that eliminates an entire class of security risks before a workload is even scheduled.

Audit Mode: How to Test Rules Without Causing a Mass Revolt

A common pitfall I see is platform teams rolling out a new, strict policy that immediately breaks existing workflows or blocks a critical production deployment. This is the moment where the platform team is seen as a blocker, causing friction and frustration. How do you enforce standards without causing a developer revolt?

This is where Kyverno’s distinction between preventive (Enforce) and detective (Audit) controls becomes a lifesaver.

Enforce Mode: This is a preventive control. It actively blocks any resource that violates the policy before it can be created or updated in the cluster.

Audit Mode: This is a detective control. It allows non-compliant resources to be created but flags them for review by generating policy reports.

This dual-mode capability allows for a safe, gradual rollout strategy. As Adevinta's engineering team noted in their tech blog on their transition to Kyverno:

"In audit mode, Kyverno will not reject any request but instead will produce a resource inside the cluster called ’admissionreport.kyverno.io’ and also ‘policyreport.kyverno.io’."

This enables a practical and low-stress workflow for introducing new policies:

1. Deploy in Audit Mode: Initially, deploy all new policies with validationFailureAction: Audit.

2. Monitor Reports: Observe the generated policy reports to see which existing workloads are non-compliant and understand the real-world impact of the new rule.

3. Collaborate and Remediate: Work with the relevant development teams to help them fix their configurations based on the audit data.

4. Switch to Enforce Mode: Once you’ve confirmed that all critical workloads are compliant, you can confidently switch the policy to validationFailureAction: Enforce, knowing it won't cause unexpected disruptions.

This feature transforms policy implementation from a high-risk, all-or-nothing event into a gradual, data-driven process that builds trust between platform and development teams.

CEL Expressions: It’s Like Excel Formulas, but for Kubernetes Security

One of the game-changing features in modern Kubernetes policy enforcement is the Common Expression Language (CEL). CEL provides a standardized, declarative way to write validation logic directly into Kubernetes resources. This is a significant shift: for many common validation rules, you may no longer need a separate policy engine like Gatekeeper or even Kyverno's own pattern-matching engine; you can use the native capabilities that Kubernetes now provides. And because Kyverno offers first-class support for this standard, it gives you an incredibly powerful toolset.

Think of CEL expressions like writing formulas in a spreadsheet. You're given an object (the Kubernetes resource YAML being submitted) and you can write simple, powerful expressions to inspect its fields and validate its contents.

Here are a few common examples of what you can do with CEL:

Checking for a specific label: object.metadata.labels.team == 'platform' This simple check ensures a resource is correctly attributed to the platform team.

Ensuring all containers have CPU limits:

• This prevents runaway processes from starving other workloads and causing cluster-wide instability.

Requiring at least one annotation: has(object.metadata.annotations) && object.metadata.annotations.size() > 0 This is useful for ensuring resources have the necessary metadata for monitoring or automation tools.

Validating an image registry using regex: object.spec.containers.all(container, container.image.matches('^my-trusted-registry.io/.*')) This is a critical supply-chain security control, ensuring that only vetted images from your organization's registry can run in the cluster.

With CEL, you get a simple yet incredibly powerful way to write complex, fine-grained validation rules directly in your Kyverno policies, making them more expressive and easier to maintain.

Real Talk: How to Gently Force People to Use Labels

A common problem in any large cloud or Kubernetes environment is inconsistent or missing labels. Without proper tagging, it becomes nearly impossible to answer basic questions like, "How much does this application cost?" or "Who owns this service?" This lack of metadata hygiene leads to operational blind spots and makes cost allocation a nightmare.

The solution is to define a tagging standard and then enforce it with a validation policy. First, decide on a set of required labels (e.g., app, owner, cost-center). Second, create a Kyverno policy to ensure every new resource has them.

This ClusterPolicy uses CEL to validate that all new Deployments contain the required labels:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-standard-labels
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-for-standard-labels
      match:
        resources:
          kinds:
            - Deployment
      validate:
        cel:
          expression: "has(object.metadata.labels) && ['app', 'owner', 'cost-center'].all(key, key in object.metadata.labels)"

A policy like this is the perfect candidate for the "Audit-then-Enforce" strategy. You can start by deploying it in Audit mode to discover which teams need to update their deployment manifests. This approach prevents the developer revolt that often happens when governance teams try to enforce tagging standards retroactively. It's about learning from the common mistakes seen in other ecosystems (like Azure) and applying a more intelligent, DevOps-native solution. This is the "gentle" part of the enforcement. Once you've given teams time to comply, you can switch to Enforce mode.

The payoff is immense: operational blind spots are eliminated, 'who owns this?' becomes a solved problem. Finance teams can finally get the clear cost allocation reports they've been asking for. This isn't just about hygiene; it's about running Kubernetes like a business.


Conclusion

Kyverno's validation policies are more than just a security tool; they are a flexible, powerful, and developer-friendly way to bring order and sanity to your Kubernetes clusters. By acting as helpful guardrails, offering a safe audit mode for rollouts, leveraging the power of CEL, and enforcing operational best practices like tagging, you can build a more secure, reliable, and collaborative platform.

To leave you with a final thought: What is the one small, unenforced standard in your cluster that, if automated with a policy, would have the biggest positive impact?

💡
If you found this useful, subscribe to our newsletter for more deep dives into cloud-native technologies. In the next article in this series, we’ll explore Generation Policies and show how they can automatically create resources like NetworkPolicies and RBAC rules to secure further and standardize your workloads.

Taming the Kubernetes Chaos: A Friendly Guide to Kyverno

Part 1 of 3

Stop "YOLO deployments" breaking your cluster! Master Kubernetes governance with Kyverno. This series covers Policy-as-Code, security, and automation using native YAML. No complex Rego—just a smarter, safer K8s journey from zero to production.

Up next

The Kubernetes Cleaning Fairy: Fixing Messy Manifests with Mutation

In a previous article, we laid the foundations for governing Kubernetes clusters, focusing on how admission policies act as essential gatekeepers. They ensure that only compliant, secure, and well-formed resources make it into your environment. But w...