Measuring Goodhart’s law
Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult or costly to measure.

Goodhart’s Law, a principle that has been widely recognized in various fields, states that when a measure becomes a target, it ceases to be a good measure. Originally rooted in economics, this law has found its way into the realm of artificial intelligence and machine learning, particularly at OpenAI, a leading organization in the development of advanced AI systems. As organizations strive to optimize complex objectives, Goodhart’s Law serves as a cautionary reminder about the potential pitfalls of focusing too narrowly on specific metrics.
The origins of Goodhart’s Law can be traced back to Charles Goodhart, a British economist, who first articulated the concept in the 1970s. He observed that when policymakers set specific targets based on economic indicators, those indicators often become distorted, leading to unintended consequences. For instance, if a government sets a target for increasing agricultural productivity, farmers might focus on maximizing output at the expense of sustainable practices, ultimately harming long-term productivity.
In the context of AI development, particularly at OpenAI, Goodhart’s Law is a critical consideration. As researchers and engineers work to optimize AI systems, they must navigate the challenge of defining and measuring objectives that are inherently complex or difficult to quantify. For example, the goal of creating an AI system that generates human-like text might be measured by metrics such as grammatical correctness, coherence, or even humor. However, if the development team focuses solely on optimizing these specific metrics, the resulting text might become overly formulaic or fail to capture the nuances of human language.
Moreover, the application of Goodhart’s Law in AI extends beyond the design of systems. It also influences the way researchers evaluate and compare AI models. In recent years, there has been a growing emphasis on benchmarking AI performance using standardized tests and metrics. While these benchmarks can provide valuable insights, they risk incentivizing developers to optimize for the specific metrics used in the benchmarks rather than achieving broader, more meaningful progress.
To address these challenges, organizations like OpenAI are exploring alternative approaches to measuring and optimizing AI objectives. One such approach is the use of reward shaping, a technique in reinforcement learning where a complex objective is decomposed into simpler, more manageable sub-objectives. By breaking down the problem, developers can ensure that each sub-objective aligns with the overall goal, reducing the risk of unintended consequences.
Another strategy is the adoption of broader, more holistic evaluation frameworks. For instance, instead of relying on a single metric like perplexity to measure language models, researchers might consider a range of factors, including creativity, empathy, and the ability to generalize to new situations. By taking a more comprehensive view, developers can create systems that better reflect the desired outcomes without becoming overly focused on specific, narrow metrics.
Furthermore, the concept of "value alignment" has gained traction in the AI community as a way to address the challenges posed by Goodhart’s Law. The goal of value alignment is to ensure that AI systems act in ways that align with human values and preferences. By incorporating ethical considerations and diverse perspectives into the development process, researchers aim to create systems that not only perform well on specific metrics but also exhibit behavior that is beneficial and desirable in a broader sense.
In conclusion, Goodhart’s Law serves as a crucial reminder for organizations like OpenAI as they navigate the complex landscape of AI development. By recognizing the limitations of relying on specific metrics and embracing alternative approaches to measurement and optimization, the AI community can work towards creating systems that are not only effective but also aligned with the diverse and evolving needs of society. As the field continues to evolve, the application of Goodhart’s Law will undoubtedly shape the way AI systems are designed, evaluated, and deployed, ultimately shaping the future of technology and its impact on our world.










