Home InternationalLearning to play Minecraft with Video PreTraining...
InternationalтнР Featured

Learning to play Minecraft with Video PreTraining

We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions). Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.

6 April 2026 at 02:17 pm
1 views
Learning to play Minecraft with Video PreTraining

In recent advancements in artificial intelligence, researchers have made significant strides in training neural networks to perform complex tasks through a technique known as Video PreTraining (VPT). This approach involves leveraging massive unlabeled video datasets of human interactions with a game, in this case, Minecraft, to teach a model how to play the game. The key innovation lies in the fact that the model is trained using only a small amount of labeled data, which drastically reduces the need for extensive human supervision.

The game of Minecraft, known for its open-ended gameplay and creative possibilities, presents a challenging environment for AI models. Traditionally, proficient humans require over 20 minutes, or 24,000 actions, to craft diamond tools, a task that requires a deep understanding of the game's mechanics and environment. However, the neural network trained using VPT has achieved this task with remarkable efficiency, showcasing the potential of this method.

The model's ability to learn from unlabeled video data is a game-changer in the field of AI. By observing human players and their actions, the model can infer the necessary skills and strategies required to succeed in the game. This approach not only saves time and resources but also allows the model to learn from a wide range of human behaviors, leading to more robust and adaptable AI systems.

One of the standout features of this model is its use of the native human interface, which includes keypresses and mouse movements. This means that the AI can interact with the game in the same way as a human player, making it highly generalizable. The model's ability to replicate human-like interactions opens up new possibilities for developing AI agents that can perform tasks on computers and other devices in a manner that is intuitive and user-friendly.

This achievement represents a significant step towards the development of general computer-using agents. Currently, many AI systems are limited to specific tasks or environments, often requiring extensive retraining for new tasks. The ability of this model to learn from human demonstrations and adapt to different scenarios suggests that we are moving closer to creating AI systems that can perform a wide range of tasks with minimal human intervention.

The success of this VPT-trained model in Minecraft also has broader implications for the field of AI research. It demonstrates that large-scale unlabeled data, combined with a small amount of labeled data, can be an effective way to train models for complex tasks. This approach could potentially be applied to other domains, such as robotics, autonomous vehicles, and even medical procedures, where learning from human demonstrations could be invaluable.

In conclusion, the Video PreTraining approach has shown remarkable promise in teaching neural networks to play Minecraft, including the challenging task of crafting diamond tools. By leveraging unlabeled video data and a small amount of labeled data, the model has achieved human-like performance, using the same interface and interactions. This breakthrough not only highlights the potential of VPT but also paves the way for the development of more versatile and adaptable AI systems that can perform a wide range of tasks on computers and other devices. As research in this area continues, we can expect to see further advancements that will reshape the landscape of AI and its integration into our daily lives.

Source: OpenAI News
ЁЯУ░ Related News
Ollama 0.2.6 Released with Native Gemma 4 Support and Enhanced Performance
Ollama 0.2.6 Released with Native Gemma 4 Support and Enhanced Performance
Ollama 0.2.6 is now live, featuring native support for Google's Gemma 4 models and improved local inference performance for Windows, macOS, and Linux.
14 Apr
Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google
Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google
Below are the most-read DIGITIMES Asia stories from the week of April 6-April 13, 2026:
14 Apr
cutile-stencil 0.2.0
cutile-stencil 0.2.0
An xDSL-based stencil compiler that generates optimized GPU kernels via NVIDIA cuTile
14 Apr
merlin-llm added to PyPI
merlin-llm added to PyPI
Merlin тАФ a fast local LLM for agentic coding on Apple Silicon
14 Apr
Fluent Cut - Craft and compose videos programmatically in PHP with an elegant fluent API
Fluent Cut - Craft and compose videos programmatically in PHP with an elegant fluent API
Craft and compose videos programmatically in PHP with an elegant fluent API - b7s/fluentcut
14 Apr
Crypto Investor at Center of Trump Corruption Allegations Now Sees Himself as тАШVictimтАЩ
Crypto Investor at Center of Trump Corruption Allegations Now Sees Himself as тАШVictimтАЩ
Justin Sun has accused Trump-affiliated World Liberty Financial of misconduct and a general lack of transparency.
14 Apr
nvidia-nat-weave 1.7.0a20260413
nvidia-nat-weave 1.7.0a20260413
Subpackage for Weave integration in NeMo Agent Toolkit
14 Apr
nvidia-nat-s3 1.7.0a20260413
nvidia-nat-s3 1.7.0a20260413
Subpackage for S3-compatible integration in NeMo Agent Toolkit
14 Apr
Social Security Trust Fund to Run Dry in 2032: Just 6 Years From Now
Social Security Trust Fund to Run Dry in 2032: Just 6 Years From Now
Six years. That is how much time separates retirees from a Social Security system that, by its own projections, runs out of money. If you are 56 years old...
14 Apr
cane-gpu-perf added to PyPI
cane-gpu-perf added to PyPI
GPU inference benchmarking with opinionated diagnostics
13 Apr