Tag: Code Review

  • Claude Code dynamic workflows make agents plan the work

    Claude Code dynamic workflows make agents plan the work

    Claude Code dynamic workflows let Claude Code write a task-specific JavaScript harness, spawn subagents, and coordinate the result instead of keeping a long job in one chat thread. Anthropic introduced the feature on June 2, 2026, and frames it as a way to handle complex coding, research, security, triage, and verification work without forcing developers to build the orchestration layer by hand.

    The short version

    • Claude Code dynamic workflows create custom harnesses for a task, then use subagents to split, verify, compare, or synthesize work.
    • Anthropic names seven useful patterns: classify-and-act, fan-out-and-synthesize, adversarial verification, generate-and-filter, tournament, loop until done, and model routing.
    • The feature is aimed at complex, high-value jobs such as refactors, migrations, deep research, source checking, support triage, and root-cause analysis.
    • The trade-off is cost and complexity. Anthropic says dynamic workflows can use significantly more tokens and are not needed for ordinary coding tasks.

    What happened

    Anthropic says Claude Code can now create a custom harness on the fly for the job in front of it. The harness is a JavaScript file with special functions for spawning and coordinating subagents, plus ordinary JavaScript utilities such as JSON, Math, and Array for processing data. A workflow can choose which model an agent uses and whether subagents run in their own worktree, which matters when a task needs isolation or a higher intelligence model.

    The company’s post describes this as a move beyond static orchestration. Developers could already coordinate multiple Claude Code runs through the Claude Agent SDK or claude -p, but those static harnesses tend to be generic because they have to survive many edge cases. Dynamic workflows push more of that planning into Claude Code itself: ask for a workflow, or use Anthropic’s trigger word “ultracode,” and Claude Code can build a structure for the current task.

    Why this is worth watching

    Claude Code dynamic workflows are worth watching because Anthropic is moving Claude Code from a single assistant loop toward task-level orchestration. In the June 2, 2026 post, Anthropic names three failure modes that show up in long agent runs: agentic laziness, self-preferential bias, and goal drift. Those are practical problems, not abstract benchmark issues.

    A separate harness gives Claude Code a cleaner way to check work against evidence and rubrics. One subagent can inspect logs, another can review files, another can verify claims, and a synthesis step can wait until each branch returns structured output. The feature will matter if that structure reduces missed requirements more often than it burns extra tokens. For more analysis of developer tooling and AI systems, see the IT & AI archive.

    What does Claude Code dynamic workflows change for developers?

    Claude Code dynamic workflows let developers request a repeatable process with a stop condition, a rubric, and isolated work streams. Anthropic’s examples include reproducing a flaky test that fails 1 in 50 runs, mining the last 50 Claude Code sessions for repeated corrections, checking every technical claim in a draft against a codebase, ranking 80 resumes, and reviewing a business plan from investor, customer, and competitor viewpoints.

    The strongest fit is work where one context window becomes a liability. Large refactors can be split by call site, module, or failing test. Security reviews can assign one verifier per rule. Research workflows can fan out source gathering and then check claims. Triage workflows can classify a backlog, dedupe it against known issues, and quarantine agents that read untrusted public content from agents that can take higher privilege actions.

    Seven workflow patterns Anthropic highlights

    Anthropic’s seven workflow patterns turn Claude Code dynamic workflows into something developers can prompt deliberately. Classify-and-act routes different tasks to different behavior. Fan-out-and-synthesize splits work into clean contexts and merges structured outputs after a barrier. Adversarial verification asks another agent to check a result against a rubric. Generate-and-filter produces candidates, removes duplicates, and keeps the best tested ideas.

    The remaining patterns handle comparison, persistence, and model choice. Tournament workflows make agents compete on the same task and use judging agents for pairwise comparisons. Loop-until-done workflows keep spawning work until no new findings or errors remain. Model and intelligence routing uses a classifier agent to decide whether a job needs a cheaper model or a stronger one such as Opus. The pattern list gives teams concrete language to use instead of vague prompts like “be thorough.”

    When not to use Claude Code dynamic workflows

    Claude Code dynamic workflows should not become the default for every prompt. Anthropic says the feature is new, best practices are still developing, and workflows may consume significantly more tokens. Most normal coding tasks do not need five reviewers, a tournament bracket, or a loop that keeps running until a broad condition is met.

    A good rule is to reserve workflows for jobs where the structure is part of the value. Use them when the task needs parallel evidence gathering, adversarial checking, repeated passes, isolated worktrees, or qualitative comparison at scale. Skip them for a small bug fix, a one-file change, or a question where a normal Claude Code session can answer cleanly. Token budgets can also be set directly in the prompt, such as asking the workflow to stay under 10,000 tokens.

    What Hacker News readers are arguing about

    The Hacker News submission for Anthropic’s post existed when checked, but it had no substantive discussion attached to it. That means there is no useful community consensus to summarize yet, and it would be misleading to turn a quiet thread into a debate.

    The missing discussion is still worth noting. The questions developers should bring to a fuller thread are predictable: whether dynamic workflows are reliable enough for real codebases, how often they waste tokens, how safe the worktree isolation is, whether adversarial verification catches real mistakes, and whether teams can share reusable workflows without turning them into brittle scripts. Treat the Hacker News link as a place to watch for later operator feedback, not as evidence today.

    The practical read

    Claude Code dynamic workflows are best understood as an orchestration feature for messy work. If your team already knows how to decompose a task, the feature may remove boilerplate around spawning agents and combining results. If your team does not know the right rubric, stop condition, or trust boundary, the workflow can still produce confident noise.

    The first experiments should be bounded. Try a flaky-test reproduction, a code review checklist, a migration with isolated worktrees, or a claim-verification pass on a technical document. Give Claude Code the workflow pattern you want, the token budget, the stop condition, and the rubric for success. Then inspect the transcript and saved workflow before using it on a higher-stakes job.

    Sources

  • SQLite agentic code policy draws a hard line for AI patches

    SQLite agentic code policy draws a hard line for AI patches

    SQLite added a plain rule to its repository guidance: it does not accept SQLite agentic code as a contribution. The project still welcomes bug reports that include a reproducible test case, which makes this less of an anti-AI manifesto and more of a maintenance boundary for a public-domain database used almost everywhere.

    The short version

    • SQLite’s AGENTS.md says the project does not accept agentic code, even though maintainers may review concise proof-of-concept patches before reimplementing changes themselves.
    • The project separates code contributions from bug reports: AI-assisted reports are acceptable when they include a reproducible test case.
    • The policy is tied to public-domain requirements, long-lived C code, Fossil-based development, and the cost of reviewing patches the maintainers did not write.
    • For AI coding tools, the useful lesson is blunt: a good repro may travel farther than a generated patch.

    What happened

    SQLite now has an AGENTS.md file aimed at people pointing coding agents at the SQLite source tree. The file explains project basics, build commands, testing commands, repository conventions, and contribution rules.

    The sharp part is the contribution policy. SQLite says it does not accept pull requests without prior agreement or legal paperwork that places the contribution in the public domain. It also says, in a separate sentence, that SQLite does not accept agentic code. Maintainers may still review a short, well-written pull request as a proof of concept, but the human SQLite developers reimplement accepted ideas themselves.

    That distinction matters because SQLite is not run like a typical GitHub-first project. Its canonical repository is Fossil, not Git, and its public-domain status is part of the project’s identity. A generated patch is not only a review burden. It can also blur authorship and provenance in a codebase that treats those details seriously.

    Why this is worth watching

    Most open source projects will not copy SQLite word for word. Plenty of maintainers do accept pull requests, and many projects live inside GitHub’s normal review flow. Still, SQLite has given maintainers a clean pattern: reject AI-written code as merge material while accepting AI-assisted evidence when it helps a human reproduce the problem.

    That is a useful split. A patch asks maintainers to trust the author, the code path, the licensing story, the tests, and the future maintenance cost. A reproducible bug report asks them to verify a failure. Those are different jobs.

    The wider lesson for developer tools is that output format matters. If an AI coding assistant produces a patch with no small failing test, it may be creating work for the maintainer. If it produces a minimal case, commands to reproduce it, and enough context for a person to inspect the failure, it has a better chance of being useful.

    For more coverage of developer-tool policy and AI engineering practice, see the IT & AI archive.

    What Hacker News readers are arguing about

    The Hacker News thread around Simon Willison’s write-up is small, so there is not enough there to claim a broad community consensus. The useful point in the comments is a clarification: SQLite is not refusing every artifact touched by an agent. It is refusing agent-written code as codebase input, while still allowing possible fixes to appear as documentation and accepting reproducible bug reports.

    A related earlier discussion on the prototype AGENTS.md commit framed the policy as a reasonable compromise. The tone was less “AI is banned” and more “give agent users rules, then keep generated code out of the project unless a human maintainer owns the final implementation.” That reading fits the file itself.

    The argument that remains open is practical. If AI tools get better at producing tests, minimization steps, and failure cases, maintainers may welcome them as triage tools. If the tools mostly produce plausible patches, projects with strict ownership rules will keep pushing back.

    SQLite agentic code policy in practice

    SQLite agentic code is the wrong deliverable for this project. A reproducible test case is the right one.

    That should influence how developers use coding agents around mature open source infrastructure. Instead of asking an agent to “fix SQLite,” ask it to isolate the failing behavior, reduce the input, show the exact command that fails, and explain why the result conflicts with documented behavior. If a patch is generated along the way, treat it as a debugging note, not as something to submit.

    For coding-agent companies, this is also a product signal. The next useful feature may not be a bigger diff. It may be a maintainer-friendly report: environment, build command, failing test, expected result, actual result, and a short explanation a human can audit.

    The practical read

    If you maintain an open source project, SQLite’s policy is a good starting template even if you soften the wording. Say whether you accept AI-written patches. Say whether AI-assisted bug reports are allowed. Say what evidence makes a report useful. The policy does not need to be dramatic; it needs to reduce ambiguity before the first generated pull request lands.

    If you contribute to projects with AI help, submit less code and better evidence. A concise failing test and reproduction steps respect the maintainer’s time. A large generated patch shifts the risk to someone else.

    Sources

  • CodeBoarding architecture diagrams map AI code review

    CodeBoarding architecture diagrams map AI code review

    CodeBoarding architecture diagrams turn a repository into navigable Mermaid docs, with static analysis and LLM reasoning doing the first pass. The pitch is simple: if AI coding agents are changing more code, reviewers need a faster way to see the shape of the system before they approve the diff.

    (more…)

  • React Doctor wants to audit the React code AI agents leave behind

    React Doctor wants to audit the React code AI agents leave behind

    React Doctor is an open source scanner for React projects that are getting more code from AI agents than humans can comfortably review line by line. It runs from the command line, reports issues across state, effects, performance, architecture, security, and accessibility, and can be wired into GitHub Actions for pull request feedback.

    The short version

    • React Doctor is published by Million.co under an MIT license and lives at millionco/react-doctor on GitHub.
    • The quick start is npx react-doctor@latest, which runs an audit from a project root without a long setup step.
    • Its pitch is narrower than a general linter: catch React-specific trouble that may slip through when agents generate code quickly.
    • The tool supports agent setup, GitHub Actions annotations, and diff-focused scanning for pull requests.
    • Treat it as a second reviewer, not a verdict machine. Static analysis can point at suspicious code, but a team still has to decide what matters.

    What happened

    Million.co has released React Doctor, a static analysis tool with the blunt tagline: “Your agent writes bad React, this catches it.” The README says it scans React codebases for issues across state and effects, performance, architecture, security, and accessibility. It also says the tool works across common React environments, including Next.js, Vite, TanStack, React Native, and Expo.

    The basic command is intentionally small: npx react-doctor@latest. After an audit, teams can run npx react-doctor@latest install to set up agent-facing guidance for tools such as Claude Code, Cursor, Codex, and OpenCode. There is also a GitHub Marketplace action for pull request annotations and comments.

    The repository was created in February 2026 and, when checked on May 28, showed more than 11,000 GitHub stars, hundreds of forks, and an MIT license. Those numbers can move quickly, but they are enough to show that this is not a quiet side note in the React tooling world.

    Why this is worth watching

    React Doctor lands in a gap that many frontend teams are starting to feel. AI coding tools can generate components, hooks, and refactors fast. The slow part is figuring out whether the result quietly introduced a stale effect dependency, an accessibility miss, a performance trap, or an unsafe pattern that only shows up later.

    Existing linters already catch plenty of mistakes. The interesting part here is the packaging: React Doctor talks like an audit tool for agent output, not a hand tuned rule set that a team spends a week configuring. That framing matters. If agents are going to submit more pull requests, teams will want cheap automated friction before a human reviewer spends attention.

    For readers tracking developer tools, the IT & AI archive has more coverage of how coding agents are changing the review loop. React Doctor fits that same pattern: code generation is becoming normal, so code acceptance needs better guardrails.

    React Doctor in practice

    The first useful test is simple. Run React Doctor on a real project and read the false positives before wiring it into CI. A scanner that finds every possible smell can still waste a reviewer’s time if the signal is too noisy.

    The safer rollout is report-only mode on a few pull requests, then diff scanning for changed files once the team understands the output. The GitHub Action is the obvious place to start because reviewers already live inside pull requests. If the tool catches repeated issues, move those categories into a stronger policy. If a category is mostly noise, keep it as advisory or turn it off if the tool allows that path.

    This is especially relevant for teams using agents to touch React Native, Expo, or Next.js code. Those stacks have enough framework-specific behavior that a generic code review checklist often misses practical UI bugs.

    What Hacker News readers are arguing about

    There is a Hacker News submission for React Doctor, but it had no comments when checked through the public HN APIs. That means there is no real thread to summarize yet.

    The absence of debate is its own small warning. React developers should judge the tool on runs against production code, not on launch-day voting. The questions worth asking are concrete: How many findings are actionable? Does it duplicate ESLint, TypeScript, or existing React rules? Can it explain issues well enough for a junior developer or an agent to fix them safely?

    The practical read

    React Doctor is worth a trial if AI coding tools are already producing React changes in your repo. Start with npx react-doctor@latest on a branch, save the report, and compare the findings with issues your team has actually seen in reviews.

    Do not make it a required CI gate on day one. Put it beside ESLint and TypeScript first. If React Doctor repeatedly catches issues that your current checks miss, then promote the narrow categories that proved useful. That is the boring path, but it is also how static analysis becomes part of a workflow instead of another dashboard nobody trusts.

    Sources