Building a WCAG AA Contrast Budget into Your CI Gate

We Shipped Accessible UI, Then Quietly Broke It

A few months ago I shipped a rating component on a client app. Five stars, the unselected ones at 0.35 opacity so they were visibly there but clearly not lit up. WCAG AA contrast verified. Lighthouse score 100. Done.

Two weeks later, polishing the component for a different surface, I tweaked the unselected opacity from 0.35 to 0.25. A one-line CSS change. Lighthouse still scored 100. No test failed. No reviewer flagged it. The component shipped to prod.

It took a screenshot from a user with reduced vision to surface the regression. The read-only star displays on the profile and the activity feed — same component, different context — had gone from "comfortably readable" to "lean-in to count the stars." The contrast had decayed by 0.15 against the page background. Undetectable to the engineer who shipped it. Invisible to every automated check we had.

That was the trigger for the system this post describes: a CI gate that measures actual pixel contrast on the deployed scene and refuses to land changes that break the budget. The same way your CI refuses to land changes that bloat the bundle.

A Contrast Budget Is a Bundle Budget for Accessibility

Most teams treat WCAG conformance as a milestone. "We did the accessibility pass last quarter." A snapshot taken once, signed off, forgotten until next quarter or the next legal review.

The problem is that contrast decays silently. A color token bumped by 5%. A new background image loaded under existing text. A shader uniform retuned. A campaign banner slid in above the fold. None of those individually scream "accessibility regression," but each one nibbles at the margin between your text color and the surface it sits on. The Lighthouse audit ran once and was fine. The next change ships without a re-audit. Nobody is watching.

You already have a pattern that handles this for performance. Bundle-size budgets — First Load JS shared by all = 87.7 kB — fail the build if you exceed them. You don't audit bundle size quarterly. You measure it on every PR, and the gate fires when it drifts.

A contrast budget is the same shape:

The current worst-case readings for each text element become the budget floor.
The CI gate measures pixel contrast on every PR against the deployed scene.
A regression past threshold (or past the margin you've agreed to hold) fails the workflow.
The gate's error output names the exact element and y-position that broke, so the fix is obvious.

That's it. The implementation is non-trivial only because canvas-backed text — the kind of text rendered over a procedural background — needs a different probe than the axe-core / Lighthouse contrast checkers most teams already have.

The Probe: `gl.readPixels` for Canvas-Backed Text

Standard accessibility tooling assumes text sits on HTML elements with computed background colors. Axe-core can walk the DOM, read background-color properties, and compute the contrast ratio against the text's color. That works for 90% of UI.

It does not work when your text sits over a WebGL canvas. The canvas's "background" isn't a CSS color — it's whatever the GPU drew last frame. To axe-core, the canvas is an opaque element with no readable color. The contrast against the actual pixels behind your text? Unmeasurable.

The probe I built uses gl.readPixels() to sample the actual canvas pixels under each text element's bounding box. For each of N text elements:

Get the text's bounding box via getBoundingClientRect().
For each sample position inside the bbox (a 5×3 grid is enough), convert CSS coordinates to drawing-buffer coordinates (multiply by devicePixelRatio — more on that bug later).
Pin the scene's animation uniforms to a specific time so the sampled frame is deterministic.
Force-render the scene: renderer.render(scene, camera). This writes the new frame to the back buffer before the next animation tick clears it.
Sample the pixel: gl.readPixels(px, py, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, buf).
Compute the relative luminance via the WCAG formula and the contrast ratio against the text color.
Report the worst-case across all sample positions and all phase pinnings.

The pinning is the crucial bit. An animated canvas — water surface with godrays in my case — has wildly different brightness at different uTime values. The worst-case bg luminance might happen at the godray pulse peak, or the wind trough, or somewhere in between. Sampling at a single "current frame" misses the actual worst case 90% of the time.

The pinning in code:

// Pin uTime across all materials that expose it, then force-render
// immediately before gl.readPixels so the back buffer holds the
// frame we're measuring.
await page.evaluate((phase) => {
  const { scene, renderer, camera } = window.__pondV2;
  scene.traverse((o) => {
    if (o.material?.uniforms?.uTime) {
      o.material.uniforms.uTime.value = phase;
    }
  });
  renderer.render(scene, camera);
}, godrayPhase);
 
// Sample the canvas pixel underneath the text element's bbox
const buf = new Uint8Array(4);
gl.readPixels(px, py, 1, 1, gl.RGBA, gl.UNSIGNED_BYTE, buf);
const luminance = relativeLuminance(buf[0], buf[1], buf[2]);
const ratio = wcagContrast(textColorL, luminance);

Running this across 8 godray phases × 9 y-offsets × 15 positions × 2 viewports × 5 text elements = 10,800 contrast readings per probe run. The probe finishes in about 60 seconds.

The CI Gate: Vercel Preview + GitHub Actions + Path Filters

The probe is the measurement. The gate is the enforcement.

The pattern uses three pieces that probably already exist on your project if you deploy on Vercel:

Vercel auto-deploys a preview URL on every push to any branch. Every PR gets its own ephemeral deployment with a unique URL.
GitHub Actions waits for that preview to be Ready, grabs the URL, and hands it to the probe. The patrickedqvist/wait-for-vercel-preview action does this in three lines of YAML.
The probe runs against the preview URL, exits non-zero on regression, and the workflow fails. The PR can't merge until the contrast budget passes.

Path filters keep the workflow cheap. It only runs on PRs that touch files that can affect canvas luminance — shaders, scene composition, hero overlay components. A PR that updates the README doesn't trigger a contrast probe.

on:
  pull_request:
    paths:
      - 'lib/koi-pond-v2/**'
      - 'app/koi-debug/**'
      - '.github/contrast-probe/**'
  push:
    branches: [main]
    paths:
      - 'lib/koi-pond-v2/**'
      - 'app/koi-debug/**'
 
jobs:
  contrast-budget:
    runs-on: ubuntu-latest
    steps:
      - uses: patrickedqvist/wait-for-vercel-preview@v1.3.2
        id: preview
      - run: node probe.mjs ${{ steps.preview.outputs.url }} \
               --url /koi-debug 1328 830 \
               --text-y-sweep 9 --godray-phases 8
      - run: node probe.mjs ${{ steps.preview.outputs.url }} \
               --url /koi-debug 390 844 \
               --text-y-sweep 9 --godray-phases 8

Two viewports (desktop 1328×830, mobile 390×844). Two probe invocations per PR. Total CI cost: about $0.04 per PR at GitHub Actions Linux pricing. The probe's output on failure names the exact element and worst pixel, so the engineer fixing the breach doesn't have to debug what failed — just why.

The Carry-Forward Pattern: Gates Become Budgets Become Memory

When the gate catches something, fixing the breach isn't enough. The cause becomes a documented carry-forward — a numbered entry in the project memory that captures the wire-tight margin, what shifted it, and what future change is at risk of breaking the same boundary.

Today my project has 34 of these. Some examples:

#19 — Mobile H1 line 1 cream margin 3.11:1. This is the tightest reading anywhere in the matrix. Any future change that brightens caustic intensity, godray peak, or water-specular reflectivity must be re-probed against this number first. If the change drops the margin below 3.0:1, the gate fires and the change rolls back.
#23 — Wind amplitude modulation transfers ~1:1 into worst-case bg luminance. The +15% modulation we tried produced a +16% bg brightness jump. Documented so the next person tuning wind coupling knows headroom must be ≥ modulation depth × ~0.16 (the empirical transfer ratio) or the gate will catch the breach.
#29 — Water depthWrite: true is load-bearing for godray containment. A subtle interaction discovered the hard way; if a future shader change disables it (for refraction, for example), godrays will silently bleed into the above-water region and break the safe-zone contrast. Inline comment + carry-forward + render-order diagnostic probe collectively make this a hard-to-regress invariant.

The carry-forwards are the institutional memory the gate enforces. The numbers in them are not abstract — they're the actual worst-case readings the probe produces, frozen as the floor the next change must respect.

What It Caught

Three real catches across the last two months, each one a regression that would have shipped to production without the gate.

1. The wire-tight subtitle collapse. I added a slow camera drift — ±2° pan, ±1.5° tilt, 38-second period. The drift was supposed to be luminance-neutral; it just moves the camera, doesn't change the scene. The CI failed it anyway. The drift had shifted a bright water-specular spot by one pixel into the mobile subtitle's worst-case sample row, dropping the contrast from 4.51:1 to 4.49:1. Below the 4.5:1 body AA threshold by 0.02. A pre-gate version of me would have shipped it. The gate caught it, the fix was widening the subtitle's cream by one luminance step from #e8dcc4 to #f0e8d4, and the next iteration stacked cleanly on top of the wider margin.

2. The DPR bug that would have shipped to retina users invisibly. I added a zone-graduated water transparency — opaque above the safe-zone line, fading to 30% alpha below — using gl_FragCoord.y / uResolution.y to compute the safe-zone band position in the shader. gl_FragCoord is in drawing-buffer pixels (CSS pixels × devicePixelRatio); I set uResolution to CSS pixels.

At DPR=1 (the Playwright test environment), the ratio worked out to [0..1] correctly. At DPR=2 (every retina Mac, every iPhone), it ranged [0..2] — pushing the safe-zone band entirely off the visible viewport. The CI gate at DPR=1 would have passed it. The implementer's self-review caught it before the CI even ran.

The lesson is the most important one in this post: a CI gate that passes at one device-pixel-ratio doesn't mean the bug isn't there at others. Self-review prompts must explicitly ask "what would the CI gate not catch?" before you trust the green checkmark.

3. The sweep that revealed the prior verification was over-confident. I had verified the contrast budget at a single text y-position and called the worst-case background luminance the "formal ceiling." When I later extended the probe to sample at five y-offsets across ±40 pixels — simulating the way text might shift during scroll parallax — three failing readings appeared that the static probe had missed entirely. The "ceiling" turned out to be just the static-position ceiling. The actual any-y ceiling was 13% brighter.

That reveal forced a token-wide cream-brightening pass and is the reason the gold accent on $2,500 in the homepage hero is now white. The wire-tight margin we live with today is a direct artifact of that discovery. It's also why the probe now samples 9 y-offsets at 10-pixel resolution by default — the sample-grid density is itself a tunable that needs deliberate calibration.

Why This Scales Beyond One Project

The probe is parameterized — --url, --text-y-sweep, --godray-phases all default to sensible values, all override on the command line. The shape is independent of the koi pond. Any animated background that text sits over benefits: gradients, video, WebGL scenes, animated SVG illustrations, generative art. The pattern doesn't care what's drawing the pixels.

Cost is small. About 60 seconds of CI feedback per probe run. Roughly $0.04 per PR at GitHub Actions Linux pricing. Two probes per PR (desktop + mobile) is a tax I willingly pay because the gate has caught three bugs in two months that would have otherwise shipped to users.

The pattern composes with existing CI. The contrast budget runs in parallel with the bundle-size check, the lint check, and the test suite. None of them block each other. All four need to be green for the merge button to enable. The bundle budget protects performance. The lint check protects style. The test suite protects logic. The contrast budget protects accessibility. They're the same kind of gate.

What This Gives You

A CI gate that fails when accessibility regresses. The same shape as your bundle-size budget, your test suite, your lint check. Not a quarterly audit. Not a "we tested it once and it's fine." A measurement that runs on every push and surfaces real numbers when something drifts.

If your product has UI text over a procedural background — and increasingly, products do — this pattern is worth a half-day. The probe is about 200 lines of Node + Playwright. The GitHub Action is about 50 lines of YAML. The Vercel preview URL infrastructure is already there if you deploy on Vercel.

The bug that hides from your CI is the one your users find first. This is one less of those.