Fine-tuning GPT-2 from human preferences

We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we’d only asked them to ensure accuracy), so our models learned to copy. Summarization required 60k human labels; simpler tasks which continue text in various styles required only 5k. Our motivation is to move safety techniques closer to the general task of “machines talking to humans,” which we believe is key to extracting information about human values.

6 April 2026 at 02:54 pm

1 views

Fine-tuning GPT-2 from human preferences

In a recent breakthrough in natural language processing, researchers have fine-tuned the 774 million parameter GPT-2 language model using human feedback for various tasks. The goal was to align the model's outputs with the preferences of external human labelers, who provided feedback on the quality and appropriateness of the generated text. While the models successfully matched the labelers' preferences, it was observed that these preferences did not always align with the researchers' own expectations.

One notable example occurred during summarization tasks. The researchers had instructed the labelers to ensure the accuracy of the summaries. However, the labelers preferred sentences that were copied wholesale from the input text. As a result, the models learned to prioritize copying over paraphrasing, even when it meant producing less engaging summaries. This unexpected outcome highlights the importance of carefully crafting instructions and guidelines for human labelers to avoid unintended consequences in model training.

The amount of human feedback required varied significantly depending on the complexity of the task. Summarization, which demanded high accuracy and conciseness, needed 60,000 human labels to achieve the desired alignment with labelers' preferences. In contrast, simpler tasks that involved continuing text in various styles required only 5,000 labels. This discrepancy suggests that more complex tasks may require additional resources and careful management of human feedback to ensure effective fine-tuning.

The motivation behind this research stems from the belief that moving safety techniques closer to the general task of "machines talking to humans" is crucial for extracting information about human values. By incorporating human feedback directly into the training process, researchers aim to create models that not only perform well on specific tasks but also better understand and adhere to human expectations and preferences. This approach could have significant implications for the development of AI systems that interact with people in a variety of contexts, from customer service to content generation.

The success of fine-tuning GPT-2 using human feedback underscores the potential of collaborative human-AI training methods. However, it also emphasizes the need for continuous evaluation and refinement of the labeling process to ensure that the models are learning the desired behaviors. As AI systems become more integrated into our daily lives, the ability to align their capabilities with human values will be essential for building trust and ensuring their beneficial impact on society.

In conclusion, the fine-tuning of GPT-2 using human preferences represents a significant step forward in creating AI models that can better understand and fulfill human expectations. While the approach has shown promise, it also highlights the challenges and complexities involved in effectively incorporating human feedback into machine learning processes. Ongoing research and development in this area will be crucial for advancing the capabilities of AI systems and ensuring their safe and beneficial integration into our lives.

Source: OpenAI News

GTA 6 developer Rockstar Games hacked once again but insists only a "limited amount of non-material company information" was compromised

GTA 6 developer Rockstar has confirmed it's been hacked by a third party after a hacking group issued a ransom demand. Read more

12 Apr

`Cheap Irish Homes’ casting for new season

The popular property show ‘Cheap Irish Homes’ is casting for its new season. The show follows property guru Maggie Molloy and architect Tadgh Casey as they help match buyers with properties. The show focuses on affordable homes across Ireland, guiding prospective buyers through the process of purchasing properties that often require some renovation. According to […] The post `Cheap Irish Homes’ casting for new season appeared first on Agriland.ie .

7 Apr

Bacon Waffles with Chives Recipe

Every poached egg needs the right serving platform. These savory waffles with crispy bacon are a perfect match for the rich egg yolk. The post Bacon Waffles with Chives Recipe appeared first on Hobby Farms .

7 Apr

The most expensive game cost over $1B, and how AI will transform it

Grand Theft Auto 6 and the future of AI If you happened to miss it, a few weeks back, here is the game trailer for Grand Theft Auto 6. It’s worth watching, and is amazing on multiple levels. But GTA 6 might be the peak of the open world category, untouched by the next wave […]

7 Apr

Champions League: Kompanys Kane-Update vorm „schwierigsten Spiel in Europa“

Der FC Bayern ist mit Harry Kane nach Madrid gereist. Ein Einsatz des Torjägers gegen Real zeichnet sich ab - ist aber noch nicht fix. Trainer Vincent Kompany sagt, was für ihn das Wichtigste ist.

7 Apr

Will States Be Ready with Counter Drone Tech for the FIFA World Cup?

Funding snafu, other issues delay counter-UAS ramp-up in Maryland, elsewhere By DRONELIFE Features Editor Jim Magill (Editor’s note: This is part of a series of stories on efforts to establish new counter-UAS protocols in the U.S. to protect high-profile sporting events and critical infrastructure from the potential threats posed by drones flown by careless or […] The post Will States Be Ready with Counter Drone Tech for the FIFA World Cup? appeared first on DRONELIFE .

7 Apr

Clojure 1.11.3

Clojure 1.11.3 is now available. CLJ-2843 - Reflective calls to Java methods that take primitive long or double now work when passed a narrower boxed number at runtime (Integer, Short, Byte, Float). Previously, these methods were not matched during reflection and an error was thrown. Java 21 added an overload to the method Thread/sleep in the 1-arity. When upgrading to Java 21, existing Clojure calls to Thread/sleep become reflective, but continue to work. As usual, you can detect reflection with *warn-on-reflection* and address with a type hint (here, ^long ) to choose the desired overload. Previously, passing a Short or Integer value to a reflective call like Thread/sleep that takes a long would not match, that has been corrected.

7 Apr

Basketball: Ex-NBA-Profi O'Neal gründet Liga fürs Dunken

Dunks gehören zum Spektakulärsten, was der Basketball zu bieten hat. Shaquille O'Neal will nun den besten Dunker der Welt ausfindig machen – und lockt mit einem enormen Preisgeld.

7 Apr

FII event highlights global investment themes as Trump and Ronaldo appear on stage

FII Priority Miami concluded its three-day summit with a series of discussions on global investment trends, economic volatility, and emerging growth sectors, alongside appearances by former US President Donald J. Trump and football legend Ronaldo Luís Nazário de Lima. FII stands for Future Investment Initiative. It is a global nonprofit focused on “Impact on Humanity”, […]

7 Apr

Wang beats Lebrun, keeps China’s ITTF World Cup hopes alive

Wang Chuqin kept alive China’s hopes of success in the men’s singles at the ITTF World Cup in Macau with a 4-2 victory over France’s Felix Lebrun on Friday. Wang dropped a tense opening game despite leading 9-6 and 10-9, as world No 6 Lebrun fought back to edge it 13-11. The Chinese left-hander again found himself in trouble in the second, trailing 2-7, but turned the game around with a series of aggressive rallies to win at 11-9. Lebrun regained the initiative in the third, moving ahead 6-2 and...

7 Apr