Mendral is the AI DevOps Engineer — three always-on agents for security, reliability, and performance, plus custom automations for any DevOps work specific to your stack. It installs as a GitHub App and starts reading your CI logs immediately.

Is Mendral the same as Mistral AI?

No. Mendral (with an 'e') is a Y Combinator-backed company building the AI DevOps Engineer — agents that automate supply-chain security, CI reliability, build performance, and any custom DevOps automation you define. Mistral AI is a separate French company that builds general-purpose foundation models.

What does Mendral actually do?

Three always-on agents run from day one. The Security Agent reviews dependency PRs, pins safe versions, and surfaces only the CVEs actually exploitable in your code — catching compromised dependencies, malicious actions, and leaked secrets before they hit production. The Reliability Agent diagnoses CI failures and fixes flaky tests. The Performance Agent cuts build time via caching, parallelism, and slow-test pruning. On top of that, you can add custom automations triggered from CI/CD events, infra alerts (Datadog, Sentry, cloud deploys), schedules, Slack, Linear, or webhooks.

How does Mendral install?

Install the Mendral GitHub App, optionally connect Slack, and Mendral starts reading your CI logs immediately. First insights arrive within minutes and the first auto-fix typically lands within hours.

Which CI providers does Mendral support?

GitHub Actions is supported today. Buildkite, CircleCI, and GitLab CI support is coming next.

How much does Mendral cost?

Mendral uses a flat monthly rate priced by team size. There are no per-seat surprises, usage caps, or per-incident charges. Contact hello@mendral.com or visit mendral.com/pricing to get a quote.

Mendral was founded in 2025 by Sam Alba (former VP of Engineering at Docker and co-founder of Dagger) and Andrea Luzzardi (former Docker engineer and Dagger co-founder). The company is part of Y Combinator's Winter 2026 batch and based in San Francisco.

Is Mendral SOC 2 compliant?

Yes. Mendral is SOC 2 Type II compliant. See mendral.com/security for details.

The Agent Harness Belongs Outside the Sandbox

An agent harness is the loop that drives an LLM. It sends a prompt, gets a response, executes the tool calls the model requested, feeds the results back, and repeats until the model says it's done. Every production agent has one. The question is where it runs.

There are two answers. They have different security properties, different failure modes, and different implications for what the agent can do. The tradeoffs also look different depending on whether you're building a single-user agent (one engineer on a laptop) or a multi-user one (dozens of engineers in the same organization sharing the same agent). We're in the multi-user camp, which surfaces problems single-user builders don't hit.

The two architectures

Harness inside the sandbox

The loop lives in the same container as the code it's working on. LLM calls go out from inside the container. Tool calls (bash, read, write) execute locally. Skills, memories, and anything else the harness tracks are files on the container's filesystem.

This is what claude does when you run it on your laptop, and what it looks like when you spin up Claude Code in a remote container. If you're building a single-user agent, you can grab the Claude Code SDK and ship something that works.

Harness outside the sandbox

The loop runs on your backend. When it needs to execute a tool, it calls into a sandbox over an API. The sandbox runs the tool and returns the result. The loop never enters the sandbox.

Side-by-side architecture diagram. Left: the agent loop and tools both live inside the sandbox, and the LLM call exits through the sandbox boundary. Right: both the agent loop and all the tools live on the backend alongside the credentials. Some tools reach into a separate, narrow sandbox over a tool RPC interface to run bash or touch workspace files.

Tradeoffs

Running the harness inside the sandbox has a few things going for it. The execution model is simple: one container, one process tree, one filesystem, one lifetime. You can reuse off-the-shelf harnesses as-is. Skills and memories work unchanged because they assume a local filesystem and they get one.

Running the harness outside the sandbox gets you things the inside model can't.

Your credentials stay out of the sandbox. The loop holds the LLM API keys, the user tokens, the database access. The sandbox holds only the environment the agent needs to do its work. There's nothing in there for the agent to escape to, so there's no permission model to enforce and no credential leak to contain.

You can suspend the sandbox when the agent isn't using it. A lot of what an agent does doesn't need a sandbox at all: thinking, calling APIs, summarizing, waiting for CI. Some sessions never touch a sandbox. With the harness outside, you provision one only when the agent needs to run a command, and suspend it whenever it's idle. When the harness lives inside the sandbox you can't do any of this, because you can't suspend the thing the loop is running on.

Sandboxes become cattle. If one dies mid-session, the loop provisions a new one and keeps going. When the harness runs inside, the sandbox is the session, and losing it loses the session.

And multi-user stops being a distributed filesystem problem. Several engineers in the same organization run the same agent. They share skills, they share memories, they sometimes investigate the same incident in parallel. When the harness runs outside the sandbox, this is a shared database. When it runs inside, it's the distributed filesystem problem we'll come back to.

Off-the-shelf local harnesses stop working once you move the loop out, because they all assume a local filesystem. Durable execution becomes your problem, because an agent session can run for hours and has to survive deploys. And once the harness and the sandbox live on different machines, "filesystem" stops being a thing you can point at.

We picked the outside model. The rest of this post is about the three things we had to solve to make it work.

Durable execution

An agent loop is a long-running function. Minutes at a minimum, hours in our case. It has to survive rolling deploys, scale events, and instance failures. Keeping the loop in memory on an API server dies the first time you ship a new version.

We already run our CI ingestion pipeline on Inngest, which we wrote about in a previous post. Extending it to the agent loop was the same decision for the same reasons: good DX, no cluster to run ourselves, and we didn't need the full generality of Temporal. The loop is an Inngest function. Each turn is a step, and Inngest checkpoints each one. If the server restarts, the loop picks up where it left off.

Sandbox lifecycle

The loop is suspended most of the time: during LLM calls, between tool calls, while waiting on a long-running workflow like CI. We want the sandbox to be suspended too, and only active when the agent is running a command. The problem is cold starts. A cold sandbox takes seconds to spin up, which is forever inside an interactive turn.

We use Blaxel for this. Blaxel gives us 25ms resume from standby. We suspend the sandbox when the agent isn't running a command and resume it the instant it is. 25ms is low enough that the agent can't tell the sandbox was ever gone.

Timeline of one agent session. The agent track alternates between LLM thinking, short run-command segments, and a long stretch waiting for a CI workflow. The sandbox track mirrors it: active only during the run-command segments, suspended everywhere else, including the entire CI wait.

The filesystem

Modern agent harnesses aren't just bash and an LLM. They have skills (prompt fragments the agent reads on demand), memories (notes the agent writes for itself or the user), subagents, plans, todo lists. All of these assume a local filesystem. A skill is a file at .claude/skills/foo.md. A memory is a file at .claude/memory/MEMORY.md. The harness reads and writes them with the same read and write tools it uses for source code.

That works on a laptop. It doesn't work when the harness is outside the sandbox.

The sandbox is disposable. We treat it as ephemeral: suspended, resumed, killed, respawned. If it dies and we spin up a new one, whatever the agent wrote to .claude/memory/MEMORY.md is gone. You could keep a long-lived sandbox per session to preserve the state, but then you're back to babysitting one sandbox per session, and you lose every other property you wanted.

The other problem is multi-user. A user's laptop runs an agent for one person. Our agent runs for dozens of engineers in the same organization. Skills are organizational: everyone on a team shares the same triage playbook. Memories are too. If the agent learns on Monday that team X always deploys from a release branch, Tuesday's session for a different engineer on the same team should know.

You could pretend the sandbox has a local filesystem, write to it, and sync everything to a database on the way out. This works in the single-user case. In the multi-user case, you've just built a distributed filesystem. Two sessions running at the same time write to the same memory file, and you have to reconcile them. Three engineers trigger the agent on the same incident, and they all see stale state until their sessions end. Conflict resolution, eventual consistency, cache invalidation.

The clean answer is to stop pretending. Put memories and skills in a database. The harness reads them from the database when the agent asks for them and writes them back when the agent updates them.

But we still want the agent to think in terms of files.

One interface, two backends

The harness virtualizes filesystem access. The agent has one read tool, one write tool, one edit tool. When the agent calls them, the harness looks at the path and routes the call based on what the path means.

Paths under the workspace go to the sandbox, the way they always did. Paths under the skill and memory namespaces go to the database. A write to a memory path is a database transaction, scoped to the organization. A read to a memory path comes from the database too, so two parallel sessions in the same org see the same memory the instant it's written.

The agent doesn't know the difference. As far as it can tell, there's a filesystem and it reads and writes files. Some of those files live in Postgres. Some live in a sandbox running across the country.

A single read/write/edit tool API at the top flows into a path-dispatch router. Paths under /workspace/* route to the sandbox over RPC. Paths under /skills/* and /memory/* route to a Postgres database over SQL. One tool surface, two backends, invisible to the agent.

Why not just add tools

The obvious alternative is to give the agent memory_read and memory_write tools alongside read and write. That works, and it's what most people do. We did it ourselves before we had the virtualization layer.

The problem is that more tools make agents worse. Each tool dilutes the attention the model pays to every other tool, makes the prompt longer, and adds another decision the model has to make at every turn. Two tools that do almost the same thing, read and memory_read, are especially bad, because the model has to disambiguate them from context and will sometimes pick wrong.

The other reason matters more. Anthropic and everyone else training frontier models are almost certainly doing reinforcement learning on harnesses that look like Claude Code. That training shapes the models to be good at a specific API surface: read(path), write(path, content), edit(path, old, new). If you invent memory_read, you're off the trained path. You get whatever the model has learned in general, minus whatever it's learned about the exact conventions it was trained on.

The virtualized interface keeps the API surface the model was trained on and puts the database semantics where we need them on the backend.

What's still hard

The SOTA moves fast. Every few weeks a new pattern (subagents, plans, background tasks) lands in Claude Code or somewhere similar, and it almost always assumes a local filesystem. We can intercept most things, but there's always a gap between a new capability shipping and our virtualization layer handling it correctly. Not running stock Claude Code is a real cost.

We picked path prefixes (/skills/, /memory/) that mirror Claude Code's local layout, and that's probably going to bite us. Claude Code's layout is still moving, and we're one convention change away from having to migrate everything. The right answer might be to expose a different interface entirely. But see above: the whole point was to keep the interface identical to what the model was trained on.

Bash is a leak. The harness can intercept read('/skills/foo.md') because it's a structured tool call. But the agent also has a bash tool, and nothing stops it from running grep -r 'foo' /skills/ in a bash session. Bash bypasses the virtualization layer and hits the sandbox's real filesystem, where /skills/ doesn't exist. We handle this with two best-effort guards: the system prompt tells the agent not to use bash for virtualized namespaces, and we parse bash invocations with tree-sitter to catch calls that reach into those paths. Neither is airtight. It's good enough for now.

Consistency is the part we haven't answered. When two sessions in the same organization are both updating memory, what should they see? Strict serializability is tempting and probably wrong, because agents aren't databases and making one session block on another's write opens up deadlock patterns we don't have answers for. We're running last-writer-wins per key, which is fine for the cases we've hit and almost certainly going to break in ways we can predict.