Keep Deterministic Work Deterministic
This is the second article in a series on agentic engineering and AI-driven development. Read part one here, and look for the next article on April 2 on O’Reilly Radar. The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts […]

In the world of AI-driven development and agentic engineering, the challenge of ensuring reliability in deterministic systems has become a focal point for researchers and developers alike. This is the second article in a series exploring these concepts, building on the first piece and leading up to the next installment on April 2nd on O’Reilly Radar. The series delves into the intricacies of creating systems that can operate with precision and consistency, even when powered by artificial intelligence.
The foundation of this discussion lies in the well-known adage, "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time." This principle highlights the significant effort required to perfect the final, often overlooked, aspects of a project. However, in the context of AI-driven systems, the challenges extend beyond traditional software development.
One experiment at the forefront of this exploration involves a blackjack simulation where a large language model (LLM) plays hundreds of hands against blackjack strategies written in plain English. The AI utilizes these strategy descriptions to make decisions such as hitting, standing, or doubling down in each hand. Meanwhile, deterministic code handles the card dealing, mathematical calculations, and rule verification.
Early iterations of this simulation revealed a 37% pass rate. The LLM frequently made errors in card total calculations, skipped the dealer's turn, or ignored the strategy it was supposed to follow. These mistakes were not isolated incidents; they compounded, leading to a domino effect of incorrect decisions. For instance, if the model miscounted the player's total on the third card, every subsequent decision in the hand would be based on incorrect information, rendering the entire game invalid.
To understand the nature of these reliability issues, it's helpful to consider the "March of Nines." This concept, coined by Andrej Karpathy from his experience building self-driving systems at Tesla, illustrates that achieving the first 90% of reliability is relatively straightforward. However, progressing from 90% to 99% and then to 99.9% requires roughly the same amount of engineering effort. Each additional nine in reliability is as costly as the last, and the process never truly ends.
To demonstrate how such failures can compound, consider interacting with an AI chatbot running an early 2026 model, such as ChatGPT 5.3 Instant. Input the following sequence:
"In a game of blackjack, the player's first two cards are 3 and 7. The dealer's up card is an 8. The strategy is to 'hit if the player's total is 21 or higher, otherwise stand.'"
The AI might respond by suggesting the player hit, as 3 + 7 = 10, which is below 21. However, if the model then incorrectly calculates the total as 20 (perhaps due to a miscount), it might advise standing, leading to a missed opportunity to reach 21. This single error propagates throughout the game, affecting all subsequent decisions.
In the realm of agentic engineering and AI-driven development, the March of Nines underscores the need for meticulous attention to detail and robust testing. As systems become increasingly reliant on AI, ensuring their reliability and determinism becomes a critical challenge. The blackjack simulation serves as a microcosm for these broader issues, offering insights into the complexities of building trustworthy and consistent AI-driven systems.
As the series progresses, it will continue to explore strategies for overcoming these hurdles, examining the intersection of AI and deterministic processes. The journey towards achieving the elusive "March of Nines" is a testament to the intricate balance required between artificial intelligence and the precision of traditional engineering. Only through a deep understanding of these dynamics can we hope to harness the full potential of AI-driven development while ensuring the reliability and consistency necessary for real-world applications.










