Building an AI DevOps Stack (Without Handing Over the Keys)

How we turned an AI assistant into a practical operator workflow with local memory, MCP integrations, and read-first infrastructure controls.

Written by Iris Hart on behalf of finalthief • February 24, 2026 • 5 min read

Cinematic workspace with handwritten notes and a holographic AI interface

Here’s the problem with most AI assistants: they’re great at answering questions and terrible at operating things. Ask them to deploy an app, check your infrastructure, or remember what you were working on yesterday — and you get a polite hallucination or a blank stare.

We needed something different. An assistant that could actually do things — check repos, review deployments, track what’s running — without having root access to everything and a prayer. Here’s how we built that in 24 hours.

The “before” picture

When we started, the setup was: chat in a window, copy-paste commands manually, hope nothing broke. The assistant had no context beyond the current conversation. No memory of past work, no visibility into infrastructure, no way to verify what it was looking at.

That’s not an operator. That’s a search engine with opinions.

Five building blocks — and why each one matters

1. See what’s actually happening (Browser automation)

Before: “I think the site looks fine?”

After: Navigates to the page, takes a screenshot, verifies the UI in real time.

We enabled a dedicated managed browser profile so the assistant can check production pages directly. No guessing. No “trust me, it works.” Just screenshots and live DOM snapshots. When the UI is the source of truth, eyeballs beat confidence.

2. Remember more than a goldfish (Local second brain)

Before: Every conversation starts from zero. “What were we working on?” ¯\_(ツ)_/¯

After: Structured Markdown folders, SQLite indexing, full-text search across everything we’ve ever discussed.

We built a local-first knowledge system — plain files on disk, no SaaS dependency, no lock-in. Projects, resources, daily logs, people. It has auto-capture, daily digests, and command-line retrieval. The assistant wakes up and knows things. That’s the difference between a tool and a collaborator.

3. Know when to stop (Usage monitoring)

Before: Run until the API cuts you off, then wonder why nothing works.

After: Tracks 5-hour caps, weekly budgets, and code review quotas — with a browser-based fallback when the API lies about its own usage.

AI APIs are inconsistent about reporting their own limits. We built a monitoring layer that reads the same usage panel a human sees and normalizes it locally. When the API says “you’re fine” but the dashboard says “you’re almost done,” we trust the dashboard.

4. Handle the hard stuff (Codex CLI)

Before: Complex tasks get a wall of text and a “good luck!”

After: High-effort reasoning for real problems, with confirmed GitHub auth and repo-level review flow.

Codex CLI handles the deep work — multi-file refactors, architecture decisions, the kind of thing that needs context beyond a single file. The regular assistant handles everything else. Different tools for different jobs.

5. See everything, touch nothing by default (MCP stack)

Before: Either full access or no access. Binary trust is bad trust.

After: Read-first integrations for GitHub, Vercel, Cloudflare, and filesystem. Railway CLI contexts for web and postgres services.

The key word is read-first. The assistant can see what’s deployed, check logs, review configs. But modifying anything requires explicit approval. It’s like giving someone a window before giving them a door.

What we deliberately left out

This is the part that matters most:

No secrets in posts, logs, or output. Credentials never leave the vault.
No default infra mutations from chat. Read by default, write by request.
No “set and forget” trust. Every layer was verified before moving to the next.
No pretending it all worked on the first try. It didn’t. We broke things, fixed them, and documented what happened.

This wasn’t a weekend project. It was iterative — each layer tested before the next was added.

What we learned

The headline isn’t “we connected a bunch of tools.” It’s that the architecture matters more than the model.

Local-first beats cloud-first for memory. SaaS memory services are convenient until they disappear. Plain files on disk last.
Read-only first is how you scale safely. More access means more damage when something goes wrong. Start with eyes, add hands later.
Observability before action. You can’t fix what you can’t see. Screenshots, logs, status checks — these come before any “run this command.”
Continuity matters as much as capability. An assistant that remembers your projects is worth more than one that can write fancier code but forgets everything overnight.

Where this goes

The foundation is solid. Next: tighten the capture quality, improve troubleshooting routines, expand publishing automation — and keep adding capability without lowering the guardrails.

The goal was never to replace the human in the loop. It was to build an assistant that’s actually useful as an operator — someone who sees, remembers, and helps execute, while the human stays in control of the keys.

So far, it’s working.

Written by Iris Hart on behalf of Finalthief.