Structural Quality Gaps in AI Governance Prompts

PublishedApril 24, 2026

•2 min read

Christo Zietsman

nuphirho is the personal blog of Christo Zietsman. The name Nu, phi, rho. Three Greek letters I picked up studying physics and mathematics, and carried through a Masters in Continuum Mechanics (Cum Laude, Stellenbosch University). They stuck as a username during university and never left. The name reflects where I started: grounded in rigour, pattern recognition, and first principles thinking. What I write about I build systems that help teams deliver reliable software at pace. I have 20+ years of experience spanning mine seismology, expert systems for antenna design, enterprise backup, and cybersecurity. This blog explores the intersection of AI-assisted software delivery, engineering process, and organisational transformation. The recurring theme: process matters more than technology. AI has changed the economics of rigorous engineering practices, making things like executable specifications, mutation testing, and formal verification layers viable in ways they were not before. I write about what I am working on, what I am learning, and what I get wrong. On AI assistance I think in systems and architecture. I do not always communicate those ideas clearly on the first pass. AI helps me bridge that gap. The thinking is mine. The clarity is a collaboration. This blog is written the same way I believe software should be built. I do the thinking, the decisions, the direction. AI assists with research, drafting, and refinement. The accountability is mine.

Most AI agent governance documents in production are structurally incomplete. That is not an observation. It is an empirical finding.

A second paper from the nuphirho.dev research programme is now on arXiv. "Structural Quality Gaps in Practitioner AI Governance Prompts: An Empirical Study Using a Five-Principle Evaluation Framework" (arXiv:2604.21090) applies a five-principle evaluation framework to 34 public AGENTS.md files sourced from GitHub. Three independent language model evaluators scored each document. Thirty-seven percent scored below the structural completeness threshold.

The five principles the framework evaluates are drawn from computability theory, proof theory, and Bayesian epistemology. Together they ask whether a governance document defines a decidable problem: does it specify success criteria, embed an assessment schema, require external verification of factual claims, scope what the agent will not do, and constrain the output format? A document that fails these tests will produce coherent output. Whether that output is correct depends entirely on what the document did not say, and there is no mechanism inside the document to catch what was left out.

The central finding is an artefact classification gap. The same file format, AGENTS.md, is being used for three architecturally incompatible purposes: task orchestration, behavioural governance, and architectural specification. There is no consensus. Different teams are solving different problems with the same artefact and calling it the same thing. That is a tractable requirements engineering problem with no current solution.

This paper is the empirical foundation for the PromptQ framework, which evaluates governance documents at authorship time. The framework is in development at promptq.ai.

arXiv:2604.21090 | doi.org/10.48550/arXiv.2604.21090

#promptq #ai-governance #research

Comments

Join the discussion

No comments yet. Be the first to comment.

More from this blog

Conway's Law at Level 5

Before I wrote this post I had a conversation I did not expect to have. I have two agents: a Science Officer and a Blogger. I was considering adding more. A Dev Lead, a BA, maybe one other I cannot recall now. So I did what felt natural. I asked them...

Apr 15, 20262 min read

Wardley Was Right

Simon Wardley changed the way I think about a problem I thought I had solved. He published a post a few weeks ago about what happens to software engineers when agents write the code. His answer: the role survives, the title probably does not. The peo...

Apr 11, 20264 min read

The Go-to-Market Bottleneck

Imagine your engineering team can ship a complete feature in a day. Specs go in, verified software comes out. You solved the pipeline problem. Is your sales team ready? Last week I wrote about the pipeline bottleneck: AI generates code faster, but re...

Apr 7, 20262 min read

I Followed the Problem Home

James Bach wrote that failing to detect a problem is not a measurement of non-problemness. He was responding to me. That exchange sent me somewhere I did not expect to go. I am not a philosopher. I am not a mathematician. I am an engineer who has spe...

Apr 4, 20266 min read

nuphirho

15 posts