? Autoresearch and the experimental society
The most important thing happening in AI right now is not just the intelligence of the models, but the harnesses that make that intelligence usable.

The most significant development in the world of AI right now isn't just about the models' intelligence, but the tools that make this intelligence accessible and usable. This is where Autoresearch, a Python code released a few weeks ago, comes in. Autoresearch, as described in EV#565, is an autonomous experimental loop that transforms the way knowledge is produced.
In its initial experiment, Autoresearch trained a GPT-2-level model over just two days, achieving a 11% faster training time and discovering 20 genuine improvements. Andrej Karpathy, the creator of Autoresearch, demonstrated its potential by running it on Shopify's internal model. The result was impressive: Autoresearch ran 37 experiments overnight, producing a 0.8-billion-parameter model that outperformed the previous 1.6-billion-parameter version by 19%. Notably, Shopify's CEO, Toby Lütke, is not a machine learning engineer, yet Autoresearch's ease of use and efficiency allowed him to harness its power effectively.
Autoresearch is revolutionary because it addresses two critical challenges simultaneously. Firstly, it automates parts of the knowledge-production process, making it more efficient and accessible. Secondly, it solves the agent control problem, ensuring that AI remains focused on the task at hand. Often, AI systems can drift if given an open-ended brief or if optimized for the wrong metrics. Autoresearch prevents this by design, as it keeps the AI on track with the human's strategic direction.
The human sets the destination, while Autoresearch manages the execution, much like a self-driving car where the driver decides the route, and the vehicle handles the driving. This balance between human guidance and AI execution is key to harnessing the full potential of AI without losing control.
Recognizing the broader applicability of Autoresearch, Andrej Karpathy spent the last month adapting it for knowledge work beyond machine learning. His goal was to create a system that could run structured, low-cost experiments on the kinds of decisions teams make weekly. He named this version AutoBeta and made the full playbook and skills available to paying members.
The measurement problem, which often plagues AI experiments, is also addressed by Autoresearch. By providing a clear, iterative process of hypothesis, testing, scoring, and iteration, Autoresearch ensures that progress is measurable and consistent. This approach not only streamlines the experimentation process but also enhances the reliability of the results.
In conclusion, Autoresearch represents a paradigm shift in how we approach AI and knowledge production. By automating parts of the process and ensuring agent control, it democratizes access to powerful AI tools. The ability to apply Autoresearch beyond machine learning opens up new possibilities for various industries and teams, enabling them to make data-driven decisions more efficiently. As Andrej Karpathy continues to refine and expand AutoBeta, the potential for transforming knowledge work and AI applications becomes increasingly clear. The future of AI is not just about building smarter models, but about creating the right tools to make those models useful and accessible to everyone.










