GenBench: Mapping out the Landscape of Generalization Research

SciTube
$
Scitube Library
$
engineering and tech
$
GenBench: Mapping out the Landscape of Generalization Research

Jul 30, 2024 | engineering and tech, physical sciences

About this episode

From ChatGPT to Google Gemini, large language models are now increasingly important in our everyday lives. These models are part of the field of natural language processing – or ‘NLP’ – which studies how machines understand and generate human language. Most NLP systems are built using machine learning, and vast amounts of language data are used as training material. Afterwards, a successfully trained model should be able to handle new scenarios. This ability is called ‘generalization’. For large language models that generalize well, a conversation about a topic it hasn’t been trained on, such as new scientific discoveries, should not be a problem. Read More

While NLP researchers widely agree that generalization is important, they don’t agree on what good generalization looks like, what types of generalization exist, and which types are relevant in different scenarios. To address this, Dieuwke Hupkes and collaborators from 20 different universities and companies conducted the largest study of generalization in NLP.

In their paper, ‘A taxonomy and review of generalization research in NLP’, published in Nature Machine Intelligence, they present a map of the generalization research landscape, using a new taxonomy of five axes along which generalization research can differ.

The first axis captures the motivation for studying a model’s generalization. Some papers have a purely practical motivation: they want to ensure models can continue to perform well when conditions change. Others have a cognitive motivation: do models generalize like humans? Others look at fairness and inclusivity: does the model behave equally well across languages or to requests from users across all social backgrounds?

The second axis examines the type of generalization, such as generalization over different tasks. If a model is trained to answer questions, can it also write poetry? Another type is cross-lingual generalization: if a model is trained mainly on English data, how much more training data does it need to do well in another language? For example, Google Gemini, having only seen a single grammar book in Kalamang, a language spoken by fewer than 200 people, performs almost as well as a human using the same material.

To measure generalization, researchers create intentional differences – called shifts – between the training material and the testing material. Axes 3 to 5 give a technical description of how those shifts were created.

Having developed their taxonomy, Hupkes’ team mapped out the current state of generalization research as a whole, documenting over 700 NLP experiments according to the five axes. Among other things, the team discovered that 70% of generalization studies are motivated by practical concerns, whereas only 3% have a fairness motivation. Similarly, generalization across tasks is much more frequently studied than cross-lingual generalization. These findings show the urgent need for more research in these areas, particularly because of the risks that large language models can bring to underrepresented communities.

This work is part of a larger initiative called GenBench – short for Generalization Benchmarking. Visitors to the GenBench website can find the paper, a visualization of the NLP generalization landscape, and a tool to register new generalization experiments.

The GenBench team is organizing a series of academic workshops and coordinating the creation of collaboratively-built generalization benchmarks. The team ultimately hopes that GenBench could lead researchers towards a better, more coordinated approach to NLP evaluation. In turn, this will improve model development, such that when you use a large language model, its responses are trustworthy and reliable, even in new and unexpected scenarios.

Original Article Reference

Summary of the paper ‘A taxonomy and review of generalization research in NLP’, in Nature Machine Intelligence, doi.org/10.1038/s42256-023-00729-y

Contact

For further information, you can connect with Dieuwke Hupkes at dieuwkehupkes@meta.com

This work is licensed under a Creative Commons Attribution 4.0 International License.

What does this mean?

Share: You can copy and redistribute the material in any medium or format

Adapt: You can change, and build upon the material for any purpose, even commercially.

Credit: You must give appropriate credit, provide a link to the license, and indicate if changes were made.

Are you ready to increase the impact of your research?

increase it now

More episodes

Stay Up To Date With SciTube

Subscribe to receive our latest videos straight to your mailbox

Follow Us On:

GenBench: Mapping out the Landscape of Generalization Research

About this episode

Original Article Reference

Contact

Are you ready to increase the impact of your research?

More episodes

Dr. Xiaolei Zhu – Dr. Barbara Slusher | A New Approach to Depression Treatment Targets Brain Immune Cells

Dr Vedrana Bali – Dr Vladimir Grubišić | How Enteric Glia Shape Health and Disease

Professor Stefan Pierzynowski – Professor Kateryna Pierzynowska | Cracking the Code on How Our Bodies Absorb Protein

What Good Looks Like on YouTube (And Why Most Research Videos Fall Short)

Dr Susanna Bairoh | Equal Degrees, Unequal Incomes: New Evidence from Finland’s STEM Workforce

Broader Impacts: Translating Research for Societal Benefit

Stay Up To Date With SciTube