Language Documentation, Ethics, and Artificial Intelligence: Technical-Ethical Challenges for Minority Languages

Aliyev Salman; Aliyev Ceyhun; Mammadova Zeynab; Akhundov Ayaz

Authors

Aliyev Salman Nakhchivan State University
Aliyev Ceyhun Nakhchivan State University
Mammadova Zeynab Nakhchivan State University
Akhundov Ayaz Nakhchivan State University

Keywords:

language documentation, low-resource languages, machine learning, dataset datasheets, model cards, data provenance, informed consent, indigenous data governance

Abstract

The digital preservation of endangered and low-resource languages (LRLs) is increasingly intersecting with the training and deployment of large multilingual machine-learning models. This intersection raises technical and ethical challenges that are amenable to empirical and engineering treatment rather than exclusively normative debate. In this paper we (1) synthesize relevant literature from language documentation and machine-learning transparency practices, (2) identify four measurable problem domains—dynamic consent, provenance traceability, layered rights (individual vs. collective), and benefit allocation—and (3) propose a program of testable technical interventions (machine-readable provenance records, dataset datasheets, model cards, and prototype dynamic-consent mechanisms) together with experimental designs to evaluate their efficacy. Our contribution is methodological and operational: we reframe ethical requirements as concrete engineering and evaluation tasks that archives, researchers, and model developers can implement and measure. We conclude with a prioritized research agenda and practical recommendations for archival practice and model documentation.

Language Documentation, Ethics, and Artificial Intelligence: Technical-Ethical Challenges for Minority Languages

Authors

Keywords:

Abstract

Published

How to Cite

Issue

Section

License