Abstract: The evaluation of Large Language Models (LLMs) across diverse languages is crucial for ensuring equitable technological progress. However, most multilingual benchmarks are created by ...