OpenAI experts trained a neural network to play Minecraft to an equally high standard as human players.
The neural network was trained on 70,000 hours of various images from the game, supplemented with a small data base of videos in which contractors performed specific tasks in the game, with the keyboard and mouse entries also recorded.
After fine-tuning, OpenAI found that the model was capable of performing all sorts of complex skills, from swimming to hunting animals and consuming their meat. He also learned the “pillar jump,” a move whereby the player places a block of material below them mid-jump to gain elevation.
Perhaps most impressively, AI was able to create diamond tools (requiring a long sequence of actions to be performed in sequence), which OpenAI described as an “unprecedented” achievement for a computer agent.
An AI breakthrough?
The significance of the Minecraft project is that it demonstrates the effectiveness of a new technique deployed by OpenAI in training AI models – called Video PreTraining (VPT) – which the company says could accelerate the development of “general computer-using agents” .
Historically, the difficulty with using raw video as a source for training AI models is that which happened is quite simple to understand, but not necessarily as the. In effect, the AI model would absorb the desired results, but not understand the input combinations needed to achieve them.
With VPT, however, OpenAI combines a large set of video data extracted from public web sources with a carefully selected set of images labeled with relevant keyboard and mouse movements to establish the fundamental model.
To fine-tune the basic model, the team then connects smaller datasets designed to teach specific tasks. In this context, OpenAI used images of players performing actions at the beginning of the game, such as cutting trees and building work tables, which would have generated a “big improvement” in the reliability with which the model was able to perform these tasks.
Another technique involves “rewarding” the AI model for hitting each step in a sequence of tasks, a practice known as reinforcement learning. This process is what allowed the neural network to collect all the ingredients for a diamond pickaxe with a human-level success rate.
“VPT paves the way to enable agents to learn how to act by watching a large number of videos on the internet. Compared to generative video modeling or contrastive methods that would only produce representational priors, VPT offers the exciting possibility to directly learn large-scale behavioral priorities in more domains than just language,” explained OpenAI in a paper. blog post (opens in new tab).
“Although we only experiment in Minecraft, the game is very open and the native human interface (mouse and keyboard) is very generic, so we believe our results are good for other similar domains, for example computer usage.”
To encourage more experimentation in the space, OpenAI has partnered with MineRL NeurIPS Contest, donating your contractor data and model code to competitors trying to use AI to solve complex Minecraft tasks. The grand prize: $100,000.