Tag: Engineering Management

  • LLM oriented engineering puts human context first

    LLM oriented engineering puts human context first

    LLM oriented engineering is less about making models write more code and more about protecting the parts of software work that still need human judgment. Yair Weinberger, writing from his work at Reindeer, argues that the scarce resource in AI-assisted teams is not typing speed. It is human context: the time and attention needed to understand architecture, say no to bad API changes, and keep generated work from spreading through the codebase.

    The short version

    • Weinberger frames human attention as the real bottleneck: LLMs can produce code, comments, documents, and PRs faster than people can read them.
    • His practical answer is stricter modeling discipline, especially around APIs and component boundaries.
    • Human code review alone does not scale when AI-generated pull requests grow, so teams need linters, LLM judges, tests, and smaller PRs.
    • PMs can use LLMs to prototype in isolated repositories, but product ideas that touch customers still need a slower modeling path before they reach production.
    • The sharpest claim is that AI multiplies both good and bad engineering habits. Weak structure now turns into debt faster.

    What happened

    Weinberger published a long X post under the phrase “LLM Oriented Engineering,” based on roughly 18 months of thinking about how Reindeer builds product in the LLM era. The post is not a tooling launch or a benchmark. It is a working theory for how a software organization should behave once generated code, documents, and PR descriptions become cheap.

    The starting point is simple: people have limited context windows too. If LLMs fill the organization with bloated comments, verbose documents, and sprawling pull requests, the next human reviewer gets less signal. Then the next model reads that noisy context and copies the pattern.

    That is why Weinberger puts modeling at the center. Translating a customer user journey into API flows, components, and boundaries is still human work. A model can add a convenient field to an API in seconds. The team may then have to support that field as a public contract for years.

    Why this is worth watching

    A lot of AI coding discussion still treats productivity as the main question. The more interesting question is what happens after productivity rises. LLM oriented engineering gives that problem a name: the team does not run out of code, it runs out of readable context.

    The post also pushes back on the idea that review can stay mostly human. Weinberger’s view is blunt: people cannot beat LLM output volume by reading harder. Absolute rules, such as forbidden service dependencies, belong in linters. Softer contracts can be checked by LLM judges on clean context. Humans should spend their attention on modeling changes, API changes, and other load-bearing decisions.

    One useful phrase from the post is “padded rooms.” These are parts of the system where LLMs can move fast because mistakes do not create long-term dependencies. Customer-specific work and experiments can live there. Core architecture should not.

    That distinction matters for anyone building coding agents or developer tooling. The product does not only need a better autocomplete loop. It needs workflows that separate throwaway experiments from production contracts, and it needs review surfaces that make human attention easier to spend. For more coverage of AI and developer tools, the IT & AI archive is the closest internal reference point.

    What the discussion is missing

    I could not find a matching Hacker News thread for this specific post, so there is no public HN argument to summarize. The missing debate is still obvious enough: Weinberger is describing a company that already has a strong internal engineering culture, strong tests, and enough discipline to keep prototypes away from production.

    That is the hard part to generalize. A small team can say “use padded rooms” and still let customer work leak into core code because the customer is loud, the deadline is real, and the AI-generated patch appears to work. A larger team can add LLM judges and still end up trusting a model that checks the wrong thing.

    The post would be stronger with concrete examples of the enforcement layer: what a useful LLM judge prompt checks, what gets blocked by linters, and how the team decides that an API change is load-bearing enough for human review. Without those examples, the argument is directionally useful but still a playbook outline.

    LLM oriented engineering, in practice

    There are five habits worth pulling out of the post.

    First, keep organizational text tight. If a comment or PR description explains history instead of the result, it probably costs more attention than it saves.

    Second, treat APIs as contracts. A field that helps one generated patch can become a long-running support burden.

    Third, make pull requests small enough to read. If a reviewer cannot hold the change in their head, the approval is mostly theater.

    Fourth, invest in reward functions. In software work, that means useful tests, end-to-end coverage where it matters, evals for LLM-backed features, and automated review that starts from clean context.

    Fifth, isolate experiments. Let PMs and agents build fast demos, but make production adoption a separate modeling decision.

    None of this is glamorous. That is the point. LLM oriented engineering is not a new layer of magic on top of software teams. It is old engineering hygiene under much higher output pressure.

    The practical read

    If your team is adopting coding agents, start by mapping which parts of your codebase are load-bearing. APIs, shared data models, permission boundaries, and core workflows should get slower review. UI experiments, customer-specific adapters, and disposable prototypes can move faster if they stay isolated.

    Then look at the review burden. If AI has made PRs bigger, comments longer, and docs noisier, you have not gained as much leverage as it looks. You have moved work from typing to comprehension.

    The practical test is simple: can a new engineer, or a clean-context review agent, understand why the system is shaped the way it is? If not, more generated code will make the team feel faster while making the product harder to change.

    Sources

  • Boring technology matters more when AI writes the code

    Boring technology matters more when AI writes the code

    Boring technology is not a nostalgia play. Aaron Brethorst argues that AI coding tools make the old “choose boring technology” rule more useful, because generated code is easier to trust when your team can actually review it. The uncomfortable part is simple: AI can write code for stacks you do not understand, but it cannot give your team the judgment it skipped.

    The short version

    • Brethorst revisits Dan McKinley’s 2015 “Choose Boring Technology” essay and applies it to Claude, Copilot, and agentic coding tools.
    • The risk is not that AI writes bad code. The risk is that it writes plausible code in unfamiliar stacks, where teams have weak review instincts.
    • Boring technology works well with AI because known tools have known failure modes, docs, operational patterns, and people who can spot odd suggestions.
    • The useful question for a new stack is: if AI generated this implementation, could the team review it without guessing?

    What happened

    Brethorst’s post starts from McKinley’s idea of “innovation tokens”: teams can afford only a limited number of new, risky technical choices before their ability to operate the system gets worse. A new language, a new framework, and a new infrastructure model in the same project may feel exciting, but every unknown adds review cost.

    AI coding assistants change the feel of that tradeoff. Claude or Copilot can produce professional-looking code for Kubernetes, GraphQL federation, Rails, JavaScript, or a framework the team barely knows. That makes the unfamiliar stack look cheaper than it is. The generated code may run. It may follow naming conventions. It may include error handling. None of that proves the design is safe, maintainable, or idiomatic.

    Brethorst’s practical rule is blunt: use AI as a multiplier for stacks you already understand. If the team knows Rails, AI-generated Rails code is easier to check. If the team knows JavaScript, Copilot’s suggestions can be reviewed against real language knowledge. In a stack nobody understands, the tool becomes a confidence machine.

    Why this is worth watching

    Boring technology has a different meaning in the AI coding era. It does not mean old for the sake of old. It means the team knows how it fails, where to find answers, which APIs are deprecated, how performance problems usually show up, and what production pain looks like at 3 a.m.

    That matters because AI-generated code has become tidy enough to hide its own problems. Bad code used to look suspicious. Now the risky version may look clean, because the model has learned the surface shape of good code. The reviewer still needs taste, context, and memory of prior failures.

    For more software and AI briefings, the IT & AI archive tracks similar stories about developer tools, AI infrastructure, and product engineering choices.

    What Hacker News readers are arguing about

    The Hacker News thread is tiny, so there is no broad community verdict to report. The one useful comment points to Django as an example of boring technology that still makes a developer more productive.

    That small reaction fits the essay better than a noisy debate would. The point is not that every team should pick Django, Rails, Postgres, or any other specific default. The point is that mature tools often pair better with AI coding assistants because the human reviewer has a sharper baseline. The discussion does not prove the argument, but it shows the kind of practical response the essay invites: name the stack you know well enough to trust yourself around.

    The practical read for boring technology

    A team evaluating AI coding tools should separate two decisions that often get mixed together. One decision is whether AI can speed up the work. The other is whether the team can review the output.

    If a project already uses a familiar stack, AI can help with boilerplate, tests, migrations, refactors, and repetitive glue code. If the project also introduces a new framework or infrastructure pattern, slow down. Build a small internal test first. Ask someone to review the generated code without running to the docs every two minutes. If that review is mostly vibes, the stack is not ready for core production work.

    Boring technology is a review strategy. It gives AI less room to fool the team and gives humans more chances to catch the mistake before customers do.

    Sources