2019/Grants/Classifying Wikipedia Actors
Title:
Classifying Wikipedia Actors
Name:
Carlin MacKenzie
Wikimedia username:
carlinmack
E-mail address:
carlin.mackenziegmail.com
Resume:
Geographical impact:
English
Type of project:
Technology
What is your idea?
Wikipedia currently has good vandalism detection tools for individual edits - this project aims to classify users who perform misconduct over time. As there aren't any tools for this, it puts excessive load on volunteer moderators as they need to go through a user's edits manually to see if there is a pattern of bad behaviour. Not only does this waste valuable volunteer time, it can only catch the editors with the most egregious editing histories.
I am currently doing research in this area during my study abroad year at the University of Virginia. This entails the creation of a database of all edits on Wikipedia since 2001, for subsequent classification of good and bad actors.
This project is currently in progress and I imagine the next steps to be:
- Detection of more complex forms of misconduct, such as complaining or discussion in bad faith
- Developing a system that will ingest information in an online fashion and flag users that have reputation scores that are decreasing. This could be integrated into the Recent changes feed.
- Creating online community discussion around adding this feature to Wikipedia - many avenues to do this with and without endorsement from the Wikimedia Foundation.
- Creating documentation to make research into Wikipedia more approachable.
- Community review to improve the current Research pages on meta.
- Creating reports and visualisations of users, misconduct and editing patterns on Wikipedia
Why is it important?
There are no tools to detect users who perform misconduct over time which decreases the quality of Wikipedia. This means that we don't know how big the problem is or strategies that are effective.
Additionally, my work aims to create documentation around Wikipedia research and improve the resources so that it is more appealing. This is a fertile area of research with applicability for network analysis, natural language processing, classification, prediction and digital humanities. However, it requires a large learning curve before it starts. This information will encourage future research into these areas.
Is your project already in progress?
How is it relevant to credibility and Wikipedia? (max 500 words)
The less misconduct there is on Wikipedia, the more credible it is. With the tools that I will develop we can hopefully get a handle on the scale of the issue and start thinking about strategies to deal with it.
What is the ultimate impact of this project?
Could it scale?
Hopefully the resulting processing techniques could be light enough to be integrated into the Recent changes feed.
Other avenues could be a Wikipedia gadget or script that could display scores for users like ORES currently does.
Why are you the people to do it?
I am familiar with the Wikipedia research space and I am a Wikipedian of several years. I have already started this project and the database is being created with the requisite data. Additionally, I have made contacts in academia, the WMF and in industry, which are important to coordinate future strategy.
What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?
What are the challenges associated with this project and how you will overcome them?
How much money are you requesting?
$8,250
How will you spend the money?
The money will cover many different costs.
My university is timid in granting me the large amounts of storage required to store all of Wikipedia. For this reason, I would like to offer funds for purchasing several terabytes of storage space so that I am not limited by storage - $750
Additionally, I would like to have funds available for:
- Development - $4000
- Community survey - $500
- Documentation - $1500
- Data analysis, reports and visualisations - $1500
How long will your project take?
If I am sponsored, I will be able to pursue this research in September for the duration of the academic year, until June 2021.
My current research is until June of this year.
Have you worked on projects for previous grants before?