Semantic git

27 Mar 2026 • 3 min read

The core premise of git is that you version-control individual characters. If I make a one-character change in a file, that is considered a diff.

This character-by-character version control works when humans write code, because character-level changes map to human intent: a one-line bug fix, a renamed variable, a tweaked condition. The entire culture of code review (PRs, line comments, git blame) is built on this assumption.

When AI generates code, diffs no longer map neatly to intent. You get all sorts of weird cases. Here are two common failure modes I encounter frequently.

Large diff, no behavioral change: An agent refactors a module. It flattens nested conditionals, extracts helpers, reorders methods, and renames a bunch of internal variables. The pull request shows 200 lines changed across a dozen files. You spend twenty minutes reviewing it and conclude that nothing meaningful has changed. The code does exactly what it did before.

Tiny diff, enormous behavioral change: An agent makes a one-character change:

# Before
if attempts > 3:
    lock_account()
# After
if attempts >= 3:
    lock_account()

The diff looks trivial but the behavior change is catastrophic. Your users now get locked out an attempt earlier than intended. In review, your eye slides right past it.

These are small examples, but the problem compounds across a codebase full of AI-generated code. Character-level diffs give you no reliable signal about what to pay attention to and what to skip.

I've become one of those people who no longer reviews the code from coding agents. The review process feels like either wasted effort on noisy refactors or false confidence on changes I couldn't meaningfully evaluate at the character level.

Instead, I have resorted to other schemes.

The most consequential thing I've built is a Claude Code skill called semantic-git. It generates concise natural language explanations of code as markdown files, colocated next to the source:

repo/
├── OVERVIEW.semantic.md
├── ARCHITECTURE.semantic.md
├── .semantic-manifest.json
├── src/
│   ├── auth/
│   │   └── .semantic.md
│   ├── lib/
│   │   ├── processor.ts
│   │   ├── processor.semantic.md
│   │   └── utils/
│   │       └── .semantic.md

These files describe what the code does and why. They update automatically whenever the associated code changes. The skill hooks into the agent workflow so that any code modification triggers a corresponding doc update. The .semantic.md file becomes the artifact I actually review. I can glance at it and understand a large, multi-file change in seconds because it captures intent rather than implementation.

But even this approach keeps bumping up against the limitations of the tools beneath it. The .semantic.md files are still versioned with character-level diffs. What I actually want is a semantic diff. Not "line 14 changed from X to Y" but "the retry logic now applies to all request types, not just POSTs." The current tooling can't express that.

And the relationship between the docs and the code only flows in one direction. When code changes, the docs update. But when I edit a .semantic.md file to say "this module should also handle timeouts," nothing happens. The docs describe the code but don't control it. I want to edit the spec and have the code follow.

Which brings me back to where I started. If the LLM is a compiler, then prompts and specs are your source code, and the generated code is a build artifact. You version control the source. You review the source. You don't need to read every line of assembly your compiler emits. You need confidence that the compiler works and that your source says what you mean.

Not all software should be treated this way. You wouldn't want your cryptography library maintained by editing prose specs and trusting a compiler you can't fully verify. But most software written today — internal tools, CRUD apps, glue code, dashboards — might be better served by version-controlling the intent and letting the implementation be regenerated. I don't yet know what that tooling looks like. Right now, the .semantic.md files are a rough sketch of it. The spec as source, the code as shadow.