theplatformlog

One login, every cluster: when is it actually safe to broadcast an OIDC token?

Adding opt-in multi-cluster token broadcast to Headlamp — and why issuer + client-id + audience is the whole ballgame.

kubernetesoidcheadlampauthenticationsecuritygo

Headlamp is a web UI for Kubernetes. Point it at a kubeconfig with ten cluster contexts and it’ll happily manage all ten. But if every one of those clusters trusts the same OIDC identity provider — one Keycloak realm, one Okta app, one Dex — you still log in ten times, once per cluster, even though a single IdP issued the token each time.

That’s the itch behind PR #5929 (currently a draft, opt-in, default-off). The fix sounds trivial — “just reuse the token across clusters” — and the interesting part is exactly why it isn’t trivial. Broadcasting an auth token safely turns out to depend on a narrow invariant, and the real-world session lifetimes (looking at you, EKS) reframe the whole feature.

Why you log in once per cluster

Headlamp stores its auth as a per-cluster cookie (headlamp-auth-<cluster>). The OIDC login flow runs per context: you pick a cluster, get bounced to the IdP, come back with an id_token, and Headlamp sets the cookie for that cluster. Nothing carries the result over to the sibling contexts, so each one makes you do the dance again.

The naive instinct is “copy the token to all the other cookies.” Don’t — not blindly. A bearer token isn’t a skeleton key. A Kubernetes apiserver configured for OIDC only accepts an id_token that it can validate against its trusted issuer and audience. Hand cluster-b a token minted for a different OIDC app and the apiserver rejects it (or, worse, you’ve widened the blast radius of a credential in a way nobody reasoned about).

The invariant that makes broadcasting safe

Here’s the load-bearing observation. An OIDC id_token carries an aud (audience) claim, and in the standard kube OIDC setup that audience defaults to the client-id. An apiserver trusts tokens from a specific idp-issuer-url carrying a specific client-id as the audience. So the same id_token is valid at two different clusters precisely when both clusters’ OIDC config shares:

  1. the same idp-issuer-url, and
  2. the same client-id.

That’s the whole safety condition. Broadcast the cookie only to contexts that match the source on both, and the token you’re copying is one the target apiserver was always going to accept. Match on neither — or only one — and you must not.

Architecture diagram

So the feature is an opt-in flag plus a function that walks the kubeconfig and applies that filter:

// Gated by --oidc-use-token-broadcast (disabled by default).
func (c *HeadlampConfig) broadcastOIDCToken(
    w http.ResponseWriter, r *http.Request, sourceCluster, token string,
) {
    sourceContext, err := c.KubeConfigStore.GetContext(sourceCluster)
    if err != nil { /* log + return */ }

    sourceOIDCConfig, err := sourceContext.OidcConfig()
    if err != nil || sourceOIDCConfig == nil {
        return // source isn't OIDC — nothing to broadcast
    }

    kContexts, _ := c.KubeConfigStore.GetContexts()
    for _, kCtx := range kContexts {
        if kCtx.Name == sourceCluster {
            continue // never broadcast to self
        }
        if kCtx.AuthType() != "oidc" {
            continue // skip SA-token / exec / other auth
        }
        oidcConfig, err := kCtx.OidcConfig()
        if err != nil || oidcConfig == nil {
            continue
        }
        // The invariant:
        if oidcConfig.IdpIssuerURL != sourceOIDCConfig.IdpIssuerURL ||
            oidcConfig.ClientID != sourceOIDCConfig.ClientID {
            continue
        }
        auth.SetTokenCookie(w, r, kCtx.Name, token, c.BaseURL, c.SessionTTL)
    }
}

Three properties worth calling out, because they’re what a reviewer checks:

The behavior is pinned by a table of unit tests — matching issuer+client-id broadcasts; mismatched issuer skipped; mismatched client-id skipped; non-OIDC skipped; source-without-OIDC is a no-op; mixed contexts broadcast only to the matches. The filter is the kind of thing that rots silently if you let it, so every branch gets a case.

The honest part: this is a login-time fix for a refresh-time problem

Here’s where it gets real, and where the feature’s value is narrower than it first looks.

The broadcast fires once, at login. It does not run when the token later refreshes. And tokens refresh sooner than you’d like.

Concrete example: EKS. An EKS apiserver validating OIDC id_tokens honors the token’s expiry, and those tokens are typically short — on the order of an hour. Headlamp’s frontend keeps you logged in by silently renewing the JWT in the browser (using the refresh token) on the IdP’s cadence, and writing the fresh token back. But that refresh updates the source cluster’s cookie — the refresh logic lives in pkg/auth and doesn’t know anything about broadcast targets. The sibling clusters are still holding the original token from an hour ago.

So the lived experience on EKS is:

  1. Log into cluster-a. Broadcast sets cookies for cluster-b and cluster-c. For the next ~hour, all three are reachable with one login. 🎉
  2. The id_token expires. The browser refreshes it. cluster-a sails on with the new token.
  3. cluster-b and cluster-c are still presenting the expired token → apiserver returns 401 → you’re prompted to re-login on the siblings. 😞

In other words, login-broadcast buys you roughly one token lifetime of true single-sign-on, then the siblings fall back to per-cluster login at the first refresh. On a 1-hour-expiry cluster that’s an hour of relief; the recurring friction lives at the refresh boundary, not the login boundary.

That’s why “refresh-path broadcasting” is deliberately scoped out of this PR rather than hand-waved into it: extending the broadcast to the refresh path means an API change in pkg/auth so the refresher can see the set of broadcast targets and re-stamp their cookies too. It’s the part that actually closes the loop for short-lived-token environments — and it deserves its own PR with its own review, not a quiet rider on this one.

There’s a second documented caveat: a target apiserver running with --oidc-extra-audience can require an audience that the source token doesn’t carry. The current code matches on issuer + client-id but doesn’t detect audience overrides — so it could set a cookie that the target then rejects. Today that’s documented and left to deployment configuration; a fair review ask is to detect and skip those targets rather than assume them away.

Two shapes of the same problem

Issue #4283 surfaced two ways to do multi-cluster auth, and they aren’t competitors — they fit different deployments:

A as an opt-in flag doesn’t preclude B; they cover different topologies. The PR says as much and offers B as a separate follow-up.

The takeaway

The lesson that generalizes past Headlamp: auth tokens are not portable by default, and OIDC makes them portable across apiservers under a precise condition — same issuer, same client-id (and watch the audience). If you ever find yourself wanting to “just reuse the token,” that triple is the checklist that tells you whether it’s safe.

And the second lesson, the one the EKS hour taught me: for any token-based SSO, the user-visible win is governed by token lifetime, not login count. A login-time optimization is the easy 80%; the refresh path is where multi-cluster SSO actually lives.

PR #5929 is a draft, opt-in, and default-off — zero behavior change for existing deployments — and I’m watching for maintainer direction on whether they want A, B, or both, and on the refresh-path follow-up. If you run Headlamp across clusters behind one IdP, the PR is the place to weigh in.