The operating loop
Running Codex inside a government-adjacent workflow. The boring middle of AI adoption is what makes the model usable.
IThe wrong question
Most discussions of AI adoption in serious Indian institutions ask the wrong question. They ask whether the model can do the task. The right question is whether the task can be trusted, measured, escalated, repeated, and recovered from inside the institution.
The model is one component. The operating loop is the whole thing.
I write from practice. I run Codex, Claude, and Gemini as a daily operating stack inside my own work, which sits with senior ministers and public institutions across India's oil-and-gas sector. The stack survives because each layer is explicit and verified, not because the model is clever.
IIWhat I run
Three models are in active use. They do different things.
Claude handles architecture and review. It sees the whole codebase, holds the design conversation, weighs trade-offs, and gates what gets shipped. Its weakness is implementation speed at scale: syntax-heavy work burns the context window and the weekly cap fast.
Codex handles implementation. The unit of work is a precise brief: repo path, numbered steps, file paths, verification commands, review criteria. Codex runs the brief in a sandboxed workspace and returns a diff. Claude reviews the diff and decides whether to land it. The split keeps Claude in architecture mode and Codex in execution mode, which is how each is strongest.
Gemini provides a third-line check. A long context window for whole-repo audits, long-manuscript review, and tie-breaking when Claude and Codex disagree on a contested call.
A second layer sits underneath. MCP tools and local state give operating memory that persists between sessions. Scheduled jobs handle the long-running work without holding human attention. The brain is the durable memory; the models are the working memory.
IIIThe verification loop
Every output gets checked against something that does not lie.
For code, the check is a verification command. Codex's brief always ends with "done when these commands succeed". If they fail, the brief was wrong or the implementation was wrong. There is no separate trust step.
For factual claims, the check is a primary source. A live API, a brain file, a confirmed transaction. When the model says a number from memory, it is wrong often. When the model retrieves the number from a tool and quotes it, the error rate falls by an order of magnitude. The difference is mechanical, not motivational.
For voice, the check is a banned-words list and a writing-rules file that gets loaded into the context every session, plus a Stop hook that scans the final output before send. A mechanical guard.
The shape that connects these: every model output gets checked against something durable before it leaves the system.
IVWhat survives Monday morning
This part is undersold. A workflow that works in a demo is a different thing from a workflow that runs at 8.45 on Monday morning with a senior question incoming.
The things that survive look modest from outside.
A precise brief. Vague instructions to the model produce vague work. The vagueness is in the brief, not in the model.
A verification command. If the writer of the workflow cannot say how she would know it worked, she does not know.
Persistent memory between sessions. A workflow that has to be re-explained every day will be abandoned by the person who has to do the re-explaining.
Explicit role division. Each tool has a job, each output has a reviewer, each escalation has a path.
The things that do not survive: model rotation without role clarity, chat surfaces without persistent state, verification by vibes, and the assumption that the institutional environment will adapt to the AI rather than the other way around.
VThe institutional implications
Indian institutions ask a different question than tech offices. They ask how this survives procurement, governance, audit, training, and the day after the consultant leaves.
That is the actual job. AI adoption for serious institutions has to design for institutional reality from the first sprint. The workflow has to fit the format. The governance has to anticipate the audit. The training has to land on the people already doing the work, and on the future hires too.
When the operating loop is explicit, this is more tractable than it looks. When the operating loop is implicit, this is impossible.
The product stack matters. A controlled enterprise surface gives an institution something it can govern. A coding agent that takes implementation off the senior person's plate restores capacity at the top of the institution. An agent framing lets workflows run between people, instead of being trapped inside one. An API gives engineers a way to wire AI into the systems that already exist. None of this is enough on its own. All of it is necessary for operating practice, not theatre.
VIFrom personal practice to institutional adoption
The reason I run this stack on myself first is that the lessons do not travel without proof. I will not tell a Union ministry what to adopt without having lived the discipline I am asking them to adopt.
The translation layer is what is thin in India. The country has institutions, talent, sectors, and a serious appetite for AI. What is missing is the connective work that turns model capability into governed operating practice. That work is mostly not coding. It is workflow discovery, executive translation, governance design, training, measurement, escalation paths, and the slow work of turning an experiment into the way work gets done.
That is the work I want to do at scale, for India's serious institutions, with the right product stack.
The model is one component. The operating loop is the whole thing.