Scribe's reference API


Lucie-Aimée Kaffee, Hady Elsahar

Wikimedia username:

Frimelle, Hadyelsahar

E-mail address:

lucie.kaffee@gmail.com, hadyelsahar@gmail.com


Geographical impact:

globale, with a focus on low-resource language speakers (mostly global south, or minority languages in the global north)

Type of project:


What is your idea?

Information on the Web and on Wikipedia is biased towards the knowledge of the global north [1]. The lack of information from multiple points of view about the global south in minority languages is a threat to the credibility of information from those regions.

This is the main motivation behind initiating the Scribe project. We want to tackle one of the thresholds for new editors to participate in Wikipedia by facilitating their editing experience using locally harvested resources in their target language. One of the main barriers that they often encounter is to learn how a typical Wikipedia article is written.

Scribe is a project that kicked off in July 2019 with a 1-year funding grant from the Wikimedia Foundation. Scribe aims at supporting new editors in low-resource languages to create high-quality articles about entities and events from their culture and environment by supporting them through their editing experience (more details on the scribe web page [2] and see our demo created at the Wikimania hackathon [3]).

We want to investigate more advanced methods for reference collection and suggestion what constitutes a credible reference and how to find it for relevant topics. This WCNA grant will enable us to pursue further research and development on references suggestion features of Scribe.

When an editor wants to create a new article in their language, Scribe helps them in two ways. Firstly by giving them a structure in the form of suggested section headers. Secondly by suggesting a set of high-quality reference links that are summarized into the most important points. This can support editors in deciding whether it is useful to add in the article they are writing. However, references on the web do not have an indicator of credibility. This makes it challenging especially for new editors to select good references.

In Scribe, we want to explore how we can indicate to a user how much the domain of a reference is credible. One good indicator is how much this domain has been cited in existing Wikipedia articles. Those indicators are useful for Scribe but can be further reused in other applications working with Wikipedia’s references. Furthermore, we will be able to explore the current use of references on the different language Wikipedias.

This grant will enable us to create an API to list references based on usage and domain. This means, when a user wants to write an article for a new domain, we can suggest a list of references that have already been used in Wikipedia. This API is a useful tool for Scribe, but also outside the project can be widely used whenever credibility of references from a Wikipedia perspective needs to be referred to.

1- Uneven Geographies of User-Generated Information: Patterns of Increasing Informational Poverty (Graham et. al. 2014)
2- https://meta.wikimedia.org/wiki/Scribe
3- https://www.youtube.com/watch?v=-XZj0h5-hV0

Why is it important?

Scribe is important for creating articles in a context where there is a lack of editors. Further, the project’s understanding of reusability of Wikipedia references can scale to a variety of other projects, not only inside but also outside Wikipedia. The question of what a credible reference is and what can be suggested to users as a credible reference is a question that is asked on a larger scale. One example for such a project is WikiRank, a project we created in the Hack the Press Hackathon in London, 2020. When a user reads an article on the web, WikiRank displays them how often this website is cited for the topic they are reading about and displays websites used more: https://github.com/articlewikirank/hackthepress-wikirank This project is currently just mocking the actual reading of references and would need an API as we propose to efficiently collect usage statistics on references.

Is your project already in progress?

How is it relevant to credibility and Wikipedia? (max 500 words)

When defining credibility, we always need to define what credibility is. From a Wikipedia perspective, we can define the credibility of a source as how much it is used, especially if we measure usage in a topic domain. Our API would enable a user to, e.g., look for the most cited sources for American politicians or female scientists in a given language. Scribe particularly makes it easier to write credible and good Wikipedia articles, by suggesting references and therefore fostering a culture of using high-quality references on Wikipedia of different languages.

What is the ultimate impact of this project?

Could it scale?

The project scales by default- everything we do and create is open source and provided to the Scribe project through an API, that can be used by other projects, too. Further, the idea of suggesting credible sources can scale between languages and projects. The knowledge about what we can suggest as a credible reference in Wikipedia can be reused in other projects. What counts as credible references in different languages can be an interesting insight into credibility across languages on a large scale.

Why are you the people to do it?

We have previous experience in leading grant-based projects demonstrated in Scribe which have successfully received a project grant by the Wikimedia Foundation to work on the scribe project. Additionally, we have the relevant technical knowledge and a tight connection to the community that we believe is crucial for this project. Lucie [1] works on the intersection between Wikimedia projects (Wikidata, low resource language Wikipedias) and research for years, and has successfully created projects that are widely used now on Wikipedia, such as the ArticlePlaceholder, while publishing about the work with Wikipedia in research venues. She has worked on references in Wikimedia in a previous project, focusing on Wikidata references. Hady [2] is NLP and Machine Learning researcher, who has experience in creating research for community needs and has published in world-class NLP conferences, leading research in a variety of topics, with a focus on document retrieval and summarization. He has worked in research labs of large industries (such as Microsoft, Bloomberg, Naver), which gives him an insight on how to perform product-driven research and pushing research into production.

1- https://luciekaffee.github.io/
2- https://www.hadyelsahar.io/

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

What are the challenges associated with this project and how you will overcome them?

How much money are you requesting?

$10,000 USD

How will you spend the money?

The money will be spent on our time for research – we will work part time for 3 months on the project, enabling us to create research publications on credible sources for Scribe in Wikipedia and putting our knowledge to practice in the Scribe project. We aim to attend Wikimedia as well as research conferences. We additionally need development time to ensure that the API is scalable and easily reusable.

5500 € - research work, 20h/week for 3 months, ($22.91 p/h)
2500€ - development work, 15h/week for two months, ($20.83 p/h)
2000€ - conference travels (Wikimedia conferences, research conferences) and buffer, e.g. server cost, event organization and similar

How long will your project take?

3 months

Have you worked on projects for previous grants before?