Home TechnologyKeep Deterministic Work Deterministic...
Technology⭐ Featured

Keep Deterministic Work Deterministic

This is the second article in a series on agentic engineering and AI-driven development. Read part one here, and look for the next article on April 2 on O’Reilly Radar. The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts […]

7 April 2026 at 08:56 am
1 views
Keep Deterministic Work Deterministic

In the world of AI-driven development and agentic engineering, the challenge of ensuring reliability in deterministic systems has become a focal point for researchers and developers alike. This is the second article in a series exploring these concepts, building on the first piece and leading up to the next installment on April 2nd on O’Reilly Radar. The series delves into the intricacies of creating systems that can operate with precision and consistency, even when powered by artificial intelligence.

The foundation of this discussion lies in the well-known adage, "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time." This principle highlights the significant effort required to perfect the final, often overlooked, aspects of a project. However, in the context of AI-driven systems, the challenges extend beyond traditional software development.

One experiment at the forefront of this exploration involves a blackjack simulation where a large language model (LLM) plays hundreds of hands against blackjack strategies written in plain English. The AI utilizes these strategy descriptions to make decisions such as hitting, standing, or doubling down in each hand. Meanwhile, deterministic code handles the card dealing, mathematical calculations, and rule verification.

Early iterations of this simulation revealed a 37% pass rate. The LLM frequently made errors in card total calculations, skipped the dealer's turn, or ignored the strategy it was supposed to follow. These mistakes were not isolated incidents; they compounded, leading to a domino effect of incorrect decisions. For instance, if the model miscounted the player's total on the third card, every subsequent decision in the hand would be based on incorrect information, rendering the entire game invalid.

To understand the nature of these reliability issues, it's helpful to consider the "March of Nines." This concept, coined by Andrej Karpathy from his experience building self-driving systems at Tesla, illustrates that achieving the first 90% of reliability is relatively straightforward. However, progressing from 90% to 99% and then to 99.9% requires roughly the same amount of engineering effort. Each additional nine in reliability is as costly as the last, and the process never truly ends.

To demonstrate how such failures can compound, consider interacting with an AI chatbot running an early 2026 model, such as ChatGPT 5.3 Instant. Input the following sequence:

"In a game of blackjack, the player's first two cards are 3 and 7. The dealer's up card is an 8. The strategy is to 'hit if the player's total is 21 or higher, otherwise stand.'"

The AI might respond by suggesting the player hit, as 3 + 7 = 10, which is below 21. However, if the model then incorrectly calculates the total as 20 (perhaps due to a miscount), it might advise standing, leading to a missed opportunity to reach 21. This single error propagates throughout the game, affecting all subsequent decisions.

In the realm of agentic engineering and AI-driven development, the March of Nines underscores the need for meticulous attention to detail and robust testing. As systems become increasingly reliant on AI, ensuring their reliability and determinism becomes a critical challenge. The blackjack simulation serves as a microcosm for these broader issues, offering insights into the complexities of building trustworthy and consistent AI-driven systems.

As the series progresses, it will continue to explore strategies for overcoming these hurdles, examining the intersection of AI and deterministic processes. The journey towards achieving the elusive "March of Nines" is a testament to the intricate balance required between artificial intelligence and the precision of traditional engineering. Only through a deep understanding of these dynamics can we hope to harness the full potential of AI-driven development while ensuring the reliability and consistency necessary for real-world applications.

Source: Radar
📰 Related News
Ekaya Banaras Founder Palak Shah’s ₹40 Lakh Billboard Mistake Became a Masterclass in Startup Marketing
Ekaya Banaras Founder Palak Shah’s ₹40 Lakh Billboard Mistake Became a Masterclass in Startup Marketing
Ekaya Banaras founder Palak Shah recently opened up about one of the most expensive mistakes she made while building her luxury textile brand. During the early years of the company, Shah rented a premium billboard near Delhi’s DLF Emporio to increase brand visibility. However, after forgetting to cancel the campaign, the hoarding reportedly continued running for months — resulting in losses of nearly ₹40 lakh. The incident has now become a viral example of how small operational oversights can turn into costly business lessons for startups and entrepreneurs.
28 May
Betting On AI: Jensen Huang And NVIDIA’s Rise To The Top
Betting On AI: Jensen Huang And NVIDIA’s Rise To The Top
Before AI was inevitable, it was a gamble—and Jensen Huang went all in.
14 Apr
Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1 bring confidential computing to bare metal and AI workloads
Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1 bring confidential computing to bare metal and AI workloads
Red Hat is excited to announce the release of Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1, marking a major leap forward in our confidential computing journey. These releases graduate confidential containers on bare metal from …
14 Apr
Large AI firms hoovering maximum funding, not enough for smaller startups: Y Combinator’s Ankit Gupta
Large AI firms hoovering maximum funding, not enough for smaller startups: Y Combinator’s Ankit Gupta
YC Startup School: India’s talent pool across colleges and universities are key for building next-gen startups, which is what YC is looking to tap into. It wants to target entrepreneurs building for global markets, focussed on fintech, consumer, B2B, and ecom…
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC-RESULTS/ (PREVIEW, PIX):PREVIEW-TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
Any profit result ‌above T$505.7 billion would mark the company's highest-ever quarterly net income ​and its ninth consecutive quarter of profit growth
14 Apr
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
TSMC likely to book fourth straight quarter of record profit on insatiable AI demand
On Thursday, ​TSMC is expected to report a net profit of $17.1 billion for the quarter, according to an LSEG SmartEstimate compiled from 19 analysts. The war in the Middle East threatens to disrupt the supply of production materials for semiconductors such as…
14 Apr
If we can’t kick the habit, how do we manage AI’s energy needs?
If we can’t kick the habit, how do we manage AI’s energy needs?
One can only hope that OpenAI’s Sam Altman was joking when he sought to justify the immense energy consumption of artificial intelligence
14 Apr
What caused Nvidia Blackwell GPU prices to spike? #tech
What caused Nvidia Blackwell GPU prices to spike? #tech
Blackwell GPU hourly “rent” surges on agentic AI demand A compute pricing index tracking hourly costs for Nvidia Blackwell GPUs shows a sharp climb: hourly rental hit $4.08 , up 48% from $2.75 just two months earlier. The reported driver is rising demand tied…
14 Apr
Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access
Anthropic Releases Claude Mythos Preview with Cybersecurity Capabilities but Withholds Public Access
Anthropic has introduced Claude Mythos Preview, its most advanced AI model, improving significantly in reasoning, coding, and cybersecurity. Unlike previous releases, it will not be publicly available. Access is limited to a consortium of tech companies throu…
14 Apr