AI Agents - Diligesker IT/AI Digest

TechCrunch’s report on Aaron Levie’s warning about “AI psychosis” among CEOs lands because it names a familiar gap: executives see a strong demo, while teams still have to make the work correct, safe, and shippable. AI productivity claims can sound persuasive before that last-mile work is counted. The issue is not whether AI agents are useful. They are. The question is whether companies can tell the difference between a good prototype and a finished business process.

The short version

Box CEO Aaron Levie argued that CEOs are especially vulnerable to overestimating AI because they sit far from the last mile of work.
Layoffs.fyi counted 115,430 tech layoffs across 152 companies in the first five months of 2026, close to the 124,636 total it tracked for all of 2025.
ClickUp CEO Zeb Evans said the company cut 22% of staff after deploying roughly 3,000 AI agents, a useful case study in how quickly the narrative is moving.
The hard part is measurement: more drafts, tickets, pull requests, or proposals do not automatically mean better output.
Hacker News readers mostly argued about two things: whether “psychosis” is a fair label, and whether executives understand the review work that AI creates.

What happened

The TechCrunch piece starts with Levie’s claim that CEOs are “uniquely prone to AI psychosis” because they are far enough away from frontline work to miss the remaining labor needed to turn AI output into value. That is the sharpest point in the article. A CEO can ask an agent to draft a contract, generate HTML, summarize a customer call, or produce a product mockup. Those outputs can look convincing in a meeting. They still need review, context, policy checks, security judgment, and someone willing to be accountable when the answer is wrong.

The article also puts that argument next to a rough labor-market backdrop. Layoffs.fyi’s tracker shows 115,430 tech layoffs from 152 companies in the first five months of 2026. That does not prove AI caused the layoffs. It does show why the story is sensitive: AI is becoming part of the language companies use when they explain smaller teams, faster execution, and new operating models.

ClickUp is the most concrete example in the report. CEO Zeb Evans said the company had deployed about 3,000 AI agents and reduced staff by 22%, while trying to build what he called a “100x org.” That framing is exactly why this debate matters for builders. If agents become part of the org chart, companies need a much better answer to a basic operating question: who reviews the agent’s work, and what happens when the agent is confidently wrong?

Why this is worth watching for AI productivity claims

The useful read is that AI adoption is moving faster than AI measurement. A team can count how many agent runs completed. It can count the number of documents, tickets, or pull requests generated. Those are activity metrics. They do not say much about whether the work reduced customer pain, lowered error rates, increased revenue per employee, or freed experts from low-value chores.

That distinction matters because the research record is still mixed. California Management Review’s summary of AI productivity evidence warns against easy claims that AI adoption produces broad productivity gains by itself. An NBER paper on executives and AI productivity points to a gap between perceived gains and measured outcomes. MIT FutureTech’s labor-task research also suggests that many tasks remain harder to automate at human-level quality than the demo cycle implies.

The management bottleneck may simply move. Harvard Business Review has made a similar point: if AI increases the volume of output, managers can become the constraint because more work needs to be read, compared, approved, or rejected. Anyone who has reviewed AI-generated code or AI-written legal text knows the pattern. The first draft arrives faster. The expensive part is deciding whether it can be trusted.

For more briefs on AI products, software teams, and workplace automation, see the IT & AI archive.

What Hacker News readers are arguing about

The Hacker News thread around the TechCrunch article is active and messy in the usual useful way. A large part of the discussion focuses on the word “psychosis.” Some readers called it clickbait or a cheap use of medical language. Others defended it as a cultural shorthand for executives becoming detached from what AI can actually do. The split is worth noting because it mirrors the broader AI debate: people agree there is overconfidence, then fight over how harshly to name it.

The more practical thread is about distance from the work. Several commenters argued that this is not new. Executives have long seen a toy example, assumed the hard part was solved, and pushed a rollout that frontline teams had to absorb. The AI-specific twist is that LLMs can flatter the user while producing a plausible artifact. A CEO who prompts a chatbot into a small front-end demo may come away feeling closer to engineering than they really are.

There was also a strong operator objection: AI can create review debt. One commenter described a CEO who hit real walls around data architecture and deployment after experimenting with AI prototyping. That is the sane version of the story. The tool helped explore an idea, then exposed the need for human-designed infrastructure. Another repeated concern was failure rate. If a model gets 80% or 90% of text tasks right, the remaining errors can still be disastrous in legal, security, finance, support, or production engineering contexts.

The thread is not evidence, but it is a useful sentiment check. Builders are not rejecting AI agents outright. They are rejecting the jump from “this generated something impressive” to “this can replace the people who know where the traps are.”

The practical read

Companies should treat AI productivity claims like product claims. Define the workflow, the baseline, the quality bar, and the failure mode before tying the result to headcount. If an agent writes support replies, measure refund errors, escalation rates, customer satisfaction, and policy violations. If it writes code, measure review time, defect rate, rollback frequency, and maintenance cost. If it drafts contracts, measure legal review burden and clause-level risk.

For AI agent startups and workplace apps, the pitch also needs to mature. “We deployed 3,000 agents” is a flashy number, but buyers will eventually ask which agents survived contact with real work. The products that win will probably be the boring ones that make review easier, preserve audit trails, route uncertain cases to humans, and prove that cycle time improved without hiding risk.

For workers, the signal is more personal. The safer skill is not prompt fluency by itself. It is judgment over the last 20%: checking the output, knowing the domain constraints, spotting the quiet mistake, and deciding when automation should stop.

Tag: AI Agents

AI productivity claims are running ahead of the work

Table of Contents

The short version

What happened

Why this is worth watching for AI productivity claims

What Hacker News readers are arguing about

The practical read

Sources