Built a reproducible eval harness over 130K partner records. p95 under 200ms, zero non-determinism flagged in CI.
AI eng cultures evaluate on token budget literacy, p95 latency receipts, deterministic output design, and eval sets you actually run.
These are the cultures we tuned the AIE rewrite against. Not exhaustive, just representative.
Two columns. Phrases on the left signal the archetype. Phrases on the right disqualify you in 5 seconds of skim.
Same operator, same outcomes. Three bullets rewritten in the AIE register so you can hear how the voice changes.
Built a reproducible eval harness over 130K partner records. p95 under 200ms, zero non-determinism flagged in CI.
Rewrote the partner-tier pipeline with a fixed token budget per request, killed 3 redundant model calls, cycle time dropped 41% on the same SLO.
Shipped a deterministic scoring service (temperature 0, schema-validated outputs). 92.9% coverage, every output replayable from a JSONL trace.
The bar past the resume. If the rewrite gets you in the room, these are the criteria the loop will weigh once you are there.
The AIE archetype is pre-selected on the sandbox. Hit Run and watch a real partner-ops resume rewrite itself with word-level diff.