Keep Deterministic Work Deterministic
This is the second article in a series on agentic engineering and AI-driven development. Read part one here, and look for the next article on April 2 on O’Reilly Radar. The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts […]

In the world of AI-driven development and agentic engineering, the challenge of ensuring reliability in deterministic systems has become a focal point for researchers and developers alike. This is the second article in a series exploring these concepts, building on the first piece and leading up to a third installment on April 2nd on O’Reilly Radar. The series delves into the intricacies of creating systems that can operate with precision and consistency, even when powered by artificial intelligence.
The foundation of this discussion lies in the well-known adage, "The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time." This principle highlights the significant effort required to perfect the final, often overlooked, aspects of a project. In the context of AI-driven development, this challenge is amplified as the systems must not only be created but also refined to ensure they function accurately and reliably.
One experiment at the forefront of this exploration involves a blackjack simulation where an Large Language Model (LLM) plays hundreds of hands against blackjack strategies written in plain English. The AI utilizes these strategy descriptions to make decisions such as hitting, standing, or doubling down in each hand. Meanwhile, deterministic code handles the card dealing, mathematical calculations, and rule verification.
Early iterations of this simulation revealed a 37% pass rate, indicating substantial room for improvement. The LLM frequently made errors in card total calculations, overlooked the dealer's turn, or disregarded the strategy it was supposed to follow. Crucially, these mistakes often compounded, leading to a domino effect of incorrect decisions. For instance, if the model miscounted the player's total on the third card, every subsequent decision would be based on an incorrect number, rendering the entire hand invalid.
To understand the scale of this problem, it's helpful to consider the March of Nines. This concept, coined by Andrej Karpathy from his experience building self-driving systems at Tesla, illustrates that achieving the first 90% of reliability is relatively straightforward, but progressing beyond that requires exponentially more effort. Moving from 90% to 99% reliability takes roughly the same amount of engineering work as going from 99% to 99.9%. Each additional nine on the reliability scale demands comparable resources, and the process never truly ends.
To demonstrate how such failures can compound, one can conduct a simple experiment using an AI chatbot running an early 2026 model, such as ChatGPT 5.3 Instant. By inputting a specific sequence of commands, users can observe firsthand the challenges in achieving consistent, deterministic outcomes in AI-driven systems.
In conclusion, the journey towards reliable and deterministic AI-driven development systems is a complex one, marked by the March of Nines. While initial progress can be swift, the path to true reliability is a long and arduous one, requiring meticulous attention to detail and a deep understanding of both the AI models and the deterministic systems they interact with. As the series continues, we will explore further insights and strategies to overcome these challenges and build systems that operate with the precision and consistency necessary for real-world applications.










