Analysis and Categorization of the Rusyn Language Using the Whisper Model: Demographic Influences on Linguistic Convergence

Authors

  • Paweł Małecki AGH University of Krakow

Abstract

The article presents a detailed linguistic analysis of the Rusyn language, focusing on its complex and evolving features, such as pronunciation, as well as individual, regional, and historical variabilities. The investigation employed an artificial neural network based on the OpenAI Whisper model to perform analysis and categorization. Although the Whisper model was trained on data from the majority of state official languages, it was not specifically trained with samples of the Rusyn language due to its niche and minority/ethnic status. Consequently, speech samples in Rusyn were classified according to the most closely related available labels, allowing for the assessment of linguistic similarity between Rusyn and other (mostly) Slavic languages. The study incorporated a diverse user base segmented by gender, age, and geographic location (Poland, Ukraine, Slovakia, Serbia), revealing significant resemblances to the dominant languages within these countries and demonstrating correlations between the computed linguistic similarity and the speakers' age.

Additional Files

Published

2026-02-17

Issue

Section

Acoustics