2019/Grants/Increasing the availability and credibility of endangered language research on Wikipedia

Increasing the availability and credibility of endangered language research on Wikipedia


Anna Luisa Daigneault

Research + Output

What is your idea?

We live in a pivotal time for endangered language documentation and revitalization. There are over 3,000 threatened languages in the world, many of which may disappear by 2100, or sooner. While it is natural for languages to come and go over time, we are currently living through a period of steep decline in linguistic diversity. Recent studies indicate that one language goes extinct every 3.5 months. Several times every year, the last speaker of a language passes away, and there is no new generation of speakers to take their place. Once a language is gone, it is very difficult to bring it back, although it’s not impossible. Language preservation, reconstruction, and revival can all be accomplished by studying credible and current information such as community websites, legacy materials and recorded resources. Wikipedia is the perfect place to aggregate and highlight such information for the public because it is freely accessible, and easy to update with new research findings.

People learning about their heritage languages, scholars, and cultural stakeholders need access to centralized, up-to-date and credible resources with information about endangered languages. Ethnologue went behind a paywall in recent years, and many of Ethnologue’s language pages on the website have information that has not been updated in decades. Ethnologue is unfortunately no longer as accessible or reliable as it used to be. Wikipedia is now one of the first places where people go to obtain online information about endangered languages. Wikipedia often leads researchers and community activists to connect with other up-to-date and credible websites such as Glottolog and Endangered Languages Catalog (ELCat) where they can find even more detailed scientific information that has also been vetted and peer-reviewed..

Some Wikipedia pages on endangered languages such as Atikamekw are very well-developed and signal robust community collaboration and online presence. However, most endangered language pages on Wikipedia lack basic features such as population and regional information, history of the language, grammatical features, relevant cultural information about the speech communities who identify with these languages, as well as references that point to current publications and other existing resources approved and used by speech communities.

My idea is to improve the quality, credibility and depth of roughly 250 Wikipedia entries for endangered languages by providing descriptions and the links to the sources of latest research findings, grammatical information and links to relevant cultural and scientific information about these languages. This focus will drastically improve the quality and credibility of the pages and serve researchers and stakeholders around the globe who are eager to access this knowledge. I would train 10 linguistics interns at Living Tongues Institute for Endangered Languages to improve 25 Wikipedia pages each. Eventually, my goal would be to scale and expand this program to improve and make contributions to all 3,000+ pages for endangered languages on Wikipedia.

Why is it important?

Access to correct information about languages is crucial for people to learn about their heritage and humanity’s linguistic diversity. Without access vetted and epistemologically sound research related to languages, people veer into speculation and look for correlations where there are none.

Languages are important because they are a living manifestation of humanity’s long history on this planet. They transmit centuries of accumulated wisdom related to human adaptation and survival. They contain vital information related to land management, subsistence patterns, kinship and social relationships, local customs, cosmology and much more. Every language represents a unique way of interpreting and conveying the human experience in a specific cultural and environmental context. Languages are ever-evolving—they are shared museums of the mind. Nor are they static; they adapt and change over time, depending on the speakers’ expressive needs and social context.

Each human language has its own rules, its own sound patterns and its own ways of structuring information for ease of communication and comprehension. A language is not only a lens through which one perceives the world, but a vehicle one actively uses to navigate it. A language is a doorway that accesses the human imagination. Sayings, poetry, song lyrics – all are possible because of the expressive power of language. When a language ceases to be spoken and transmitted, its essence effectively fades away. Without systematic documentation, there is little evidence left that the language even existed at all.

Is your project already in progress?

Yes, through my role as Program Director at Living Tongues Institute for Endangered Languages, I have begun training an intern who is editing Wikipedia entries. Together, we began in January 2020 to add links to existing resources such as Talking Dictionaries and links to research and publications for over 120 Wikipedia endangered language pages. Thus far, we have contributed to over 120 pages and made incremental progress in adding resources and links. In supervising this effort, I have realized that many of the pages are sorely lacking in information and I decided to expand our efforts. I have listed our goals publicly on our website: https://livingtongues.org/contribute-to-wikipedia/

How is it relevant to credibility and Wikipedia? (max 500 words)

This project will solidify Wikipedia’s place as the go-to resource for endangered language information. In a time where speculation and disinformation abound online, it is important for Wikipedia to be credible, and experts such as linguists and indigenous knowledge holders can collaborate in disseminating facts by making their findings known and cited correctly on Wikipedia. With specially trained linguists who are passionate about increasing and improving language resources, the quality of these first 250 pages would be a significant contribution both to language activists and Wikipedia as a whole. This information is not all in one place anywhere else. Language activists are already using Wikipedia pages for endangered languages as their first point of reference, due to their ease of access and availability. As scientific experts in this field, we want to make sure that these pages are credible and up to date, and we can contribute to this goal by supervising the reviewing the quality of each update. These improvements are exactly what the language revitalization community needs.

What is the ultimate impact of this project?

The ultimate impact would be to help centralize credible resources and research that is available for each endangered language. This will result in an increased interest in language access, documentation, and revitalization. By connecting the dots on the existing research, we may be able to have an impact on preserving endangered languages in the long-term, and assisting in their revival.

Could it scale?

Yes, we can eventually scale and expand this program to improve and make contributions to all 3,000+ pages for endangered languages on Wikipedia.

Why are you the people to do it?

Living Tongues Institute for Endangered Languages is a leading nonprofit research organization dedicated to documenting endangered languages. Founded by linguist Dr. Gregory D. S. Anderson, our international teams conduct documentary linguistic fieldwork, publish scientific papers and present at academic conferences, run digital training workshops to empower language activists, and collaborate with speakers to release online tools. Our researchers have created more than 120 Talking Dictionaries to support threatened languages around the globe, and have provided valuable digital skills training as well as tech and scientific support to collaborators. Furthermore, we raise awareness about endangered languages and support language revitalization efforts in many of the communities we work in. Since 2005, Living Tongues Institute has reached more than one hundred endangered language communities in fifteen countries.

At Living Tongues Institute, our team has adopted a vertically integrative approach to language documentation, in which local language consultants learn transferable digital and scientific research skills to eventually become research assistants, colleagues, and ambassadors for their languages. By facilitating digital skills workshops during which we train local indigenous language activists to record and edit words and phrases in their native languages. We take a multi-faceted approach to help local language activists become ambassadors of their languages. We would take these careful and proven methods and apply them to our training program for interns editing Wikipedia: we would supervise the quality and accuracy of the work they are doing, and also supply them with the latest research links to make the contributions to Wikipedia.

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

Many of the world’s indigenous, small, under-resourced and/or endangered languages do not yet have a significant presence online and we seek to remedy that. As Wikipedia is not behind a paywall, it is accessible but all those who have internet access. We want to help make these language pages a reliable resource to everyone who needs them, both amplifying the voice of the speaking community and those who would like to learn more about it. We would be receptive to advice from our colleagues and collaborators in this process to be sure that we are representing as many diverse and credible voices as possible.

What are the challenges associated with this project and how you will overcome them?

Some data will be scarce for certain languages. For example, in some regions such as the Amazon and rural places in South and East Asia, speaker population and demographics are barely known because of lack of documentation. We will do our best to contact researchers that we know in the field to obtain the most accurate data where possible, and provide accurate links to verifiable sources.

How much money are you requesting?

5,000.00 USD

How will you spend the money?

The funds will be spent to cover a total of 200 paid research hours at $25/hour. During those hours, we will identify and examine the 250 Wikipedia language pages that need work, as well as write the content and list the sources that need to be linked to. We will also recruit, train and supervise a cohort of 10 unpaid linguistics interns that will each take on 25 Wikipedia pages to edit, for a total of 250 Wikipedia pages that we will work on together. We will supervise the quality of the information and sources added to each page in collaboration with the interns.

How long will your project take?

1 year

Have you worked on projects for previous grants before?

In the grant-funded projects listed below, I have worked as a researcher, grant writer and technical coordinator. For the projects that are currently active, I assist the principal investigators with linguistic data organization, project management, and other tasks. I also oversee archival efforts, the recruitment and training of interns, as well as the teaching of online workshops and run brainstorming meetings.

2019-2022. Sora Typological Characteristics: Towards a Re-Evaluation of South Asian Human History. National Science Linguistics Grant Award NSF/BCS #1844532.

2019-2021. Citizen science and cinematography: Documenting stories and technology of the Sora tribe (India). National Geographic Citizen Science Grant.

2018-2020. Documenting the Fragile Knowledge Domains of the Birhor People. The Zegar Family Foundation.

2015-2017. Documentation of Gutob, an endangered Munda language. National Science Foundation/Documentation of Endangered Languages Grant (Award #1500092).

2013-2015. Documentation of Hill Gta’–a seriously endangered Munda language. National Endowment for the Humanities / Documentation of Endangered Languages Grant (Award # PD-50025-13). Full documentation of this nearly undocumented endangered language of India.

2013-2015. Melanesia Online: Ethnobotanical, Ethnozoological, Ethnogeographic knowledge of “tok ples” in Papua New Guinea. Christensen Fund.