2019/Grants/CiteFix: a tool to fix broken references

From WikiConference North America
< 2019‎ | Grants
Revision as of 13:32, 17 February 2021 by KCVelaga (talk | contribs) (Created page with "{{WCNA 2019 Grant Submission |name=Krishna Chaitanya Velaga |username=KCVelaga |email=kcvelaga{{@}}gmail.com |resume=Jay Prakash https://in.linkedin.com/public-profile/in/jayp...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Title:

CiteFix: a tool to fix broken references

Name:

Krishna Chaitanya Velaga

Wikimedia username:

KCVelaga

E-mail address:

kcvelaga@gmail.com

Resume:

Jay Prakash https://in.linkedin.com/public-profile/in/jayprakash12345 https://meta.wikimedia.org/wiki/User:Jayprakash12345 https://github.com/Jayprakash-SE

Krishna Chaitanya https://www.linkedin.com/in/kcvelaga/ https://github.com/kcvelaga

Geographical impact:

global

Type of project:

Technology

What is your idea?

English Wikipedia has several maintenance categories that are auto-populated based on errors in referencing and citation templates. There are thousands of pages in these categories that are left unattended for a long time. While some of these are easy to fix manually, some of them aren’t. Our idea is to develop a tool named “CiteFix” which will suggest changes to users to fix the errors.

In layman terms, the tool will scan through the pages in various defined categories, and suggest edits to fix for a user to take action. The interface will be displaying the preview and the edit space of both the current version, and suggested edit version. If the user feels that that the suggested edit is fine, they would be able to easily fix the error with a click. They will also have an option to edit the suggested edit space, in case the tool is not able to generate the suggestion properly. The user can also skip a certain suggestion if they are not sure if it is right or not. The tool will mark all its edits with a hashtag, which can be used to track the usage of the tool over time.

Why is it important?

Though for experienced users, many of these errors might be an easy fix - it is not the case for new editors and readers of Wikipedia. Especially talking from the perspective of readers, these errors generate big red tags which do not create a good experience while reading. Also, for general readers, if someone wants to verify the source, they won’t be able to do it as an error text is displayed instead of full reference text.

There are at least ten thousand pages in all the above categories combined as of now, but the tool can be used to fix errors that can arise in the future as well. There is a high probability that new pages will be added to these error categories, as the citation templates are prone to errors, especially by new editors, and this can help them to easily fix things.

Is your project already in progress?

n/a

How is it relevant to credibility and Wikipedia? (max 500 words)

According to WP:WHYCITE, “By citing sources for Wikipedia content, you enable users to verify that the information given is supported by reliable sources, thus improving the credibility of Wikipedia while showing that the content is not original research. You also help users find additional information on the subject; and by giving attribution you avoid plagiarising the source of your words or ideas.”

Every citation added, adds to credibility and verifiability of Wikipedia. However, it is not just important to add a citation, it should also be added the right way. If not, it fails the whole purpose as a general reader will not be able to access the source. In a community of thousands of volunteers, mistakes are inevitable. This tool helps to fix those mistakes in a simple, easy and engaging way - thereby contributing to the credibility of Wikipedia, a little at least.

What is the ultimate impact of this project?

We are envisioning the ultimate impact of the project in two categories. The first one is internationalisation - though for this stage of the project, we will be focusing on English Wikipedia only, we will ensure that the tool can be internationalised for other language Wikipedians, without a huge effort. If the tool is used on more language Wikipedias, the impact will be more significant, rather than on just Wikipedia. As we progress towards the end of the project, we will talk to at least 1-2 other Wikipedians if they’d like to adopt this, and see what’s possible, and also develop documentation for the same.

The second component is to use this tool in campaigns. A campaign focused to fix such basic issues can be a great help for newcomers, or it can also be used in campaigns such as 1lib1ref.

Could it scale?

Yes, the answer to this mostly lies in the answer to the question “What is the ultimate impact of this project?” - scaling is mainly applicable to expansion to other languages.

Why are you the people to do it?

Jay Prakash is a seasoned developer who has created several tools and is working on MediaWiki since 2017 (his 500+ fixes has been merged), which can be seen from his GitHub and Phabricator profile. He is an expert in web applications & MediaWiki. Addinally Jay was Google Summer of Code Intern for Wikimedia Foundation in 2019. Krishna is a seasoned English Wikipedian, his experience will be useful to guide Jay on understanding the nuances of each of the problems mentioned. Moreover, he is an experienced project manager who has received several grants from the Wikimedia Foundation in the past.

Additionally, both of them have worked together on a project to redesign the wiki of University Innovation Fellows program (of Hasso Plattner Institute of Design at Stanford; https://universityinnovation.org/wiki/Main_Page). During the project, Jay developed several extensions used for management of learning, while Krishna focused on design and coordination of the overall framework. This experience of working together on this project, and several other collaborations, will make us a great team to work on this project.

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

Though it is a direct relation to the overall credibility of Wikipedia, which can bring in diversity in a lot of ways, and the tool can be used by people who are not tech-savvy, it is inclusive. Though there might be an unforeseen impact, we are not able to establish a direct connection on this issue.

What are the challenges associated with this project and how you will overcome them?

Integration with Wikimedia: Login by OAuth with Wikimedia projects for tools is not easy. We have to apply for app credentials on MetaWiki, then a steward will approve our application. After that we will use mwoauth python library in our web backend to do Open Authentication with Wikimedia projects. Reference error pattern: The tool primarily works using pattern identification, and it can’t be done just with one expression or a function. A lot of patterns need to be identified. We will have a bunch of regex expressions, and this set will be constantly updated to identify new patterns. We will create a form for users to report any unrecognized patterns. Mobile friendly: We will have this challenge as we worked on mobile friendly web apps very few times. We will use cascading stylesheet’s media-query which allows us to identify devices like Mobile, Tablet, and Desktop then we will style our app like margin, padding, box-model etc according to device and set breakpoints based on screen size. Deployment of workers: Worker queue runs in the background so they need continuous monitoring. We will use Celery as our tool worker and will create some custom script that will check their status if script founds status negative it will restart the workers again.

How much money are you requesting?

USD 13,500

How will you spend the money?

  • Developer: USD 8,000 (400 hrs * $20/hr)
  • Product Manager: USD 4,000 (200 * $20/hr)
  • Development sprints: USD 500 (travel, per diem)
  • Contingency: USD 500
  • Fiscal sponsor and bank charges: USD 500

How long will your project take?

If we assume that we would receive a funding decision by the end of March, this following will be the broad timeline:

2021 April & May: Research, prep, and setting the group June: Development Sprint 1 July and August: Continuing on development and fixes September: Development Sprint - 2 (also Final) Oct & Nov: User testing and feedback December: Fixes and improvements from user testing 2022 Jan and Feb: Documentation and reporting

Note: The timeline can be condensed, if needed, however, we’d prefer this.

Have you worked on projects for previous grants before?

https://meta.wikimedia.org/wiki/Grants:Conference/KCVelaga/Wikigraphists_Bootcamp_(2018_India)/Report https://meta.wikimedia.org/wiki/Grants:Project/Rapid/VVIT_WikiConnect/Annual_Plan_(2018%E2%80%932019)/Report https://commons.wikimedia.org/wiki/Commons:SVG_Translation_Campaign_2019_in_India/Report more at https://meta.wikimedia.org/wiki/User:KCVelaga/Outreach