Home TechnologyAIs can now often do massive easy-to-verify SWE ta...
Technology⭐ Featured

AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

I've recently updated towards substantially shorter AI timelines and much faster progress in some areas. [1] The largest updates I've made are (1) an almost 2x higher probability of full AI R&D automation by EOY 2028 (I'm now a bit below 30% [2] while I was previously expecting around 15% ; my guesses are pretty reflectively unstable) and (2) I expect much stronger short-term performance on massive and pretty difficult but easy-and-cheap-to-verify software engineering (SWE) tasks that don't require that much novel ideation [3] . For instance, I expect that by EOY 2026, AIs will have a 50%-reliability [4] time horizon of years to decades on reasonably difficult easy-and-cheap-to-verify SWE tasks that don't require much ideation (while the high reliability—for instance, 90%—time horizon will be much lower, more like hours or days than months, though this will be very sensitive to the task distribution). In this post, I'll explain why I've made these updates, what I now expect, and implications of this update. I'll refer to "Easy-and-cheap-to-verify SWE tasks" as ES tasks and to "ES tasks that don't require much ideation (as in, don't require 'new' ideas)" as ESNI tasks for brevity. Here are the main drivers of my update: Opus 4.5 and Codex 5.2 were both significantly above my expectations (on both benchmarks and other sources of information). This isn't that much of an update by itself, we should expect some variation and some models to be decently large jumps, but then Opus 4.6 (and probably Codex 5.3 and

6 April 2026 at 05:27 pm
1 views
AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

In recent months, there has been a significant shift in the expectations surrounding artificial intelligence (AI) development timelines and capabilities. This update is driven by several factors, including advancements in AI models and their performance on specific tasks. The most notable changes involve a substantial increase in the probability of full AI research and development (R&D) automation by the end of 2028, as well as improved short-term performance on massive, difficult yet easy-to-verify software engineering (SWE) tasks that do not require much novel ideation.

The primary driver behind these updates is the performance of AI models such as Opus 4.5 and Codex 5.2, which exceeded initial expectations. While some variation in model performance is expected, the consistent above-expectation results from Opus 4.6 and the anticipated performance of Codex 5.3 and 5.4 have significantly influenced the revised timelines. In 2025, there was an observed doubling of capabilities every three and a half months on the METR 50%-reliability time horizon, with an additional jump at the beginning of 2026, albeit with some uncertainty.

These advancements have been particularly evident in the AI's ability to accomplish large and impressive software engineering tasks with only moderately sophisticated scaffolding. Tasks that would typically take humans months or even years to complete have been achieved by AI in a fraction of the time. This capability is not limited to simple or straightforward tasks but extends to more complex, difficult SWE tasks that are easy to verify.

The implications of these updates are significant. By the end of 2026, it is expected that AI will have a 50%-reliability time horizon of years to decades for reasonably difficult easy-to-verify SWE tasks that do not require much ideation. However, achieving high reliability, such as 90%, is projected to take significantly less time—hours or days rather than months. This progress will be highly sensitive to the distribution of tasks, with some areas seeing faster advancements than others.

The shift towards shorter timelines and faster progress in specific areas of AI development underscores the rapid pace of technological change. As AI models continue to improve and their capabilities expand, the potential for automation in research and development processes becomes increasingly realistic. This not only accelerates the pace of innovation but also has implications for industries reliant on software engineering, as AI becomes a more integral part of the development process.

In conclusion, the recent updates to AI timelines and capabilities are a result of impressive model performance and rapid advancements in software engineering tasks. With AI demonstrating the ability to handle complex, difficult tasks with ease, the future of automation in R&D and SWE is looking increasingly promising. As these capabilities continue to grow, the integration of AI into various industries is poised to transform the way we approach research, development, and problem-solving.

Source: LessWrong
📰 Related News
Ekaya Banaras Founder Palak Shah’s ₹40 Lakh Billboard Mistake Became a Masterclass in Startup Marketing
Ekaya Banaras Founder Palak Shah’s ₹40 Lakh Billboard Mistake Became a Masterclass in Startup Marketing
Ekaya Banaras founder Palak Shah recently opened up about one of the most expensive mistakes she made while building her luxury textile brand. During the early years of the company, Shah rented a premium billboard near Delhi’s DLF Emporio to increase brand visibility. However, after forgetting to cancel the campaign, the hoarding reportedly continued running for months — resulting in losses of nearly ₹40 lakh. The incident has now become a viral example of how small operational oversights can turn into costly business lessons for startups and entrepreneurs.
28 May
Betting On AI: Jensen Huang And NVIDIA’s Rise To The Top
Betting On AI: Jensen Huang And NVIDIA’s Rise To The Top
Before AI was inevitable, it was a gamble—and Jensen Huang went all in.
14 Apr
Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1 bring confidential computing to bare metal and AI workloads
Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1 bring confidential computing to bare metal and AI workloads
Red Hat is excited to announce the release of Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1, marking a major leap forward in our confidential computing journey. These releases graduate confidential containers on bare metal from …
14 Apr
Large AI firms hoovering maximum funding, not enough for smaller startups: Y Combinator’s Ankit Gupta
Large AI firms hoovering maximum funding, not enough for smaller startups: Y Combinator’s Ankit Gupta
YC Startup School: India’s talent pool across colleges and universities are key for building next-gen startups, which is what YC is looking to tap into. It wants to target entrepreneurs building for global markets, focussed on fintech, consumer, B2B, and ecom…
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC-RESULTS/ (PREVIEW, PIX):PREVIEW-TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
Any profit result ‌above T$505.7 billion would mark the company's highest-ever quarterly net income ​and its ninth consecutive quarter of profit growth
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
On Thursday, ​TSMC is expected to report a net profit of $17.1 billion for the quarter, according to an LSEG SmartEstimate compiled from 19 analysts. The war in the Middle East threatens to disrupt the supply of production materials for semiconductors such as…
14 Apr
If we can’t kick the habit, how do we manage AI’s energy needs?
If we can’t kick the habit, how do we manage AI’s energy needs?
One can only hope that OpenAI’s Sam Altman was joking when he sought to justify the immense energy consumption of artificial intelligence
14 Apr
What caused Nvidia Blackwell GPU prices to spike? #tech
What caused Nvidia Blackwell GPU prices to spike? #tech
Blackwell GPU hourly “rent” surges on agentic AI demand A compute pricing index tracking hourly costs for Nvidia Blackwell GPUs shows a sharp climb: hourly rental hit $4.08 , up 48% from $2.75 just two months earlier. The reported driver is rising demand tied…
14 Apr
Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access
Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access
Anthropic has introduced Claude Mythos Preview, its most advanced AI model, improving significantly in reasoning, coding, and cybersecurity. Unlike previous releases, it will not be publicly available. Access is limited to a consortium of tech companies throu…
14 Apr