The Babel: Effect Analyzing Multilingual Performance Discrepancies in Large Language Models

Basab Jha

Related Journals

Recent Articles

Enhancing English to Amharic Machine Translation with Prior Knowledge Integration: Leveraging Syntactic Structures of the Source Language
Use and Efficacy of f N-(N-Butyl) Thiophosphoric Triamide (Nbpt) Urease Inhibitor and Urea Fertilizers on Corn (Zea Mays) and Sugarcane (Saccharum Officinarum)
Technological Transformations, Formation of GMS (Global Mental System) and GFS (Global Forecasting System): "Right-Brain Technologies" Based on Biological Entities with Consciousness, Artificial Intelligence, Auantum Computing, and Blockchain: "Banchenko-Market" (Global Market of Lucid Dreams and Other Transcendental States of Consciousness)
Breaking Times Arrow: Exploring Time Reversal Through Quantum Simulations and Classical Gravitational Disturbances
Formation-Evaporation of Germanium Monoxide in Water Vapor and Preparation of (Ge+GeO2) Films
The Babel: Effect Analyzing Multilingual Performance Discrepancies in Large Language Models

Share This Page

The Babel: Effect Analyzing Multilingual Performance Discrepancies in Large Language Models

Abstract

Basab Jha

Large Language Models (LLMs) like GPT-4 and mBERT have revolutionized natural language processing (NLP) by providing multilingual capabilities, making it possible to develop models that handle diverse linguistic inputs across various languages. However, despite these advances, there remains a noticeable performance gap between how well these models perform in high-resource languages such as English and low-resource languages such as Nepali or Malagasy. We term this phenomenon the "Babel Effect," highlighting the disproportionate performance that arises from differences in resource availability across languages. This paper aims to explore the root causes of these performance discrepancies in LLMs, focusing on the underlying challenges in tokenization, training, and data scarcity. We utilize cross-lingual benchmarks, such as XGLUE and TyDiQA, to quantify these performance variations and examine them in detail. Furthermore, we propose solutions, including enhancing tokenization strategies, employing data augmentation techniques, and refining fine-tuning methods. The paper concludes with a discussion on how these improvements can mitigate the Babel Effect and lead to more equitable language modeling across diverse linguistic contexts.

PDF

Engineering and Applied Sciences Journal

Related Journals

Recent Articles

Share This Page

The Babel: Effect Analyzing Multilingual Performance Discrepancies in Large Language Models

Abstract