Blog
How We Know If Our Agent Is Right
36,564 investigations, no ground truth, and the eval signals we actually trust.
The Agent Harness Belongs Outside the Sandbox
Two architectures for running agent harnesses, the tradeoffs between them, and how we make skills and memories work when the harness isn't local.
Same LLM, Different Agent: What Changes When You Specialize for CI
Same models, different tokens. What changes when you build an agent harness specialized for CI instead of general-purpose coding.
We Upgraded to a Frontier Model and Our Costs Went Down
We switched to a frontier model and our costs went down. Here's the architecture that made it possible.
We use Claude Code daily. We still built our own CI agent.
How Mendral closes 16,000 CI investigations a month: three Anthropic tiers, Firecracker microVMs, durable execution on Inngest, and a custom Go agent loop.
LLMs Are Good at SQL. We Gave Ours Terabytes of CI Logs.
We gave our AI agent a SQL interface to billions of CI log lines in ClickHouse. How we ingest, store, and query 1.5 billion log lines a week.
What CI Actually Looks Like at a 100-Person Team
575K CI jobs, 1.18 billion log lines, 33 million test executions in one week. What we learned building an AI agent for PostHog's CI at scale.