Minecraft Meets AI: How Virtual Agents Learn to Collaborate

SciTube
$
Scitube Library
$
engineering and tech
$
Minecraft Meets AI: How Virtual Agents Learn to Collaborate

Aug 1, 2025 | engineering and tech

About this episode

A conversational agent is a type of AI system that can interact with humans using their own language. AI researchers aim to create agents that can have natural conversations with humans while also carrying out instructions in interactive virtual worlds. Not only do these worlds provide researchers with a convenient platform to specify tasks for their conversational agents – they also allow them to gather large amounts of data, and to evaluate the performance of their systems. In recent research, Chris Madge, Massimo Poesio and other members of the ARCIDUCA team at Queen Mary University of London show how conversational agents can be deployed within the virtual world of Minecraft. Read More

Their system implements an ‘architect’ with a target structure in mind, which provides instructions to a ‘builder’ on how to complete the structure using blocks in the Minecraft world. By allowing the builder to ask questions, the team’s system is much more flexible and better at handling unclear instructions.

As the world’s best-selling video game, Minecraft is renowned for the creativity it offers to its players. With its immense creative flexibility, the game has now become a useful tool for researchers exploring how people and AI agents can work together to perform tasks. In previous studies, researchers investigated a task where one player, the ‘architect’, provides language-based instructions for another player, the ‘builder’, to create a target structure by placing blocks within the Minecraft world.

For Madge and Poesio, this study was especially useful as it combined language with real-time actions in a shared virtual space – allowing them to explore how agents understand instructions and refer to objects in a constantly-changing world. Based on this previous research, the researchers replaced the architect and builder players with large language models – AI systems that are trained to imitate human language. They then tested several different types of large language models to determine how well they performed as architects and builders.

Madge and Poesio’s system can ask questions for clarification. For example, if the builder isn’t sure which Minecraft block the architect is referring to, it can simply ask for more details – instantly clearing up any ambiguities.

In a follow-up study, the researchers added notes to the system’s messages, linking them to earlier parts of the conversation or changes happening in the Minecraft world. The resulting dataset helped them to understand how these models can handle language in different ways depending on context – especially in a virtual world that’s constantly changing.

The team’s approach offers a new level of flexibility to conversational agents in virtual worlds. By using powerful language models, and focusing on task-based interactions between them, their system not only follows instructions, but can also ask questions when clarification is needed. Madge and Poesio hope that their research will pave the way for smarter, more responsive agents that collaborate with us in the virtual world.

Original Article Reference:

Summary of the papers: ‘Large Language Models as Minecraft Agents’, doi.org/10.48550/arXiv.2402.08392 and ‘MDC-R: The Minecraft Dialogue Corpus with Reference’, doi.org/10.48550/arXiv.2506.22062