2019/Grants/CiteLearn - an academic tool for learning to cite sources

From WikiConference North America
< 2019‎ | Grants
Jump to navigation Jump to search


CiteLearn - an academic tool for learning to cite sources


Simon Knight, Heather Ford, Shibani Antonette

Wikimedia username:


E-mail address:



http://sjgknight.com/cv https://hblog.org/ https://antonetteshibani.com/

Geographical impact:


Type of project:


What is your idea?

The practice of adding citations to Wikipedia and learning where citations are needed provides excellent training in research quality beyond Wikipedia. Our idea is to build a text-based task/game in which university students learn about verifiability: In round 1, students must locate where citations should be placed in articles that are provided to them, and for which we have ground truth. This allows us to assess the correctness, or otherwise, of the student edits. Ground truth will be provided via (1) providing articles with known citations, where these are removed for the purposes of the task; (2) knowledge of Template:Citation needed tags, and (3) use of the "Citation Detective” public dataset of sentences missing citations. In round 2, students must write articles, including citations. In this latter round, the Citation Detective API - which is used to label sentences that require a citation - will be used to provide automated feedback to students, and develop their citation practices. Other tools could also be added including Cite Unseen (which detects bias in citations), and Citation Reason (which classifies reasons for a citation being given; see, Chou et al., 2020). We anticipate this game being used to onboard people into Wikipedia citation practices, and more broadly to develop the skills associated with citation (credibility checking, verifiability, etc.) in a general population.

The tool will be piloted with students (subject to institutional review), using data from the activity (pre/post-test scores), and where appropriate use of pre/post survey instruments for example on source use competence and citation motivation (see appendices in, Ma and Qin, 2017).

Chou, A. J., Gonçalves, G., Walton, S., & Redi, M. Citation Detective: a Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale. https://upload.wikimedia.org/wikipedia/commons/6/6f/Citation_Detective_WikiWorkshop2020.pdf Ma, R., & Qin, X. (2017). Individual factors influencing citation competence in L2 academic writing. Journal of Quantitative Linguistics, 24(2-3), 213-240.

Relevant resources: Citation Detective: https://meta.wikimedia.org/wiki/Citation_Detective Citation Hunt: https://meta.wikimedia.org/wiki/Citation_Hunt https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements#Potential_Applications https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/API_design_research https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statements/Citation_Reason_Pilot https://meta.wikimedia.org/wiki/Cite_Unseen

Why is it important?

Citation practices - the provision of reliable external warrants for claims - are central to trust and credibility not only on Wikipedia, but in journalism, scientific research, scholarly writing, and other genres of research and writing. In university contexts, students are not only being socialised into a particular professional genre, but must respond to institutional contexts and norms (Lea & Street, 2006); a key component of these social norms in academic writing involves the broad set of ‘academic integrity’ practices. A focus on rules for plagiarism avoidance, but not the driving purposes of academia, fails to provide students with an understanding of academic inquiry as a positive aim (McGowan, 2005). Both appropriate paraphrasing, and high quality summary and synthesis writing, are challenging skills to learn; effective paraphrasing is something all students need support in developing (Keck, 2006).

Despite this, there are few learning tools to develop citation practices. Previous ‘game based’ citation tools rely on multiple-choice-quiz (MCQ) assessments largely focusing on issues of plagiarism and academic integrity, and without direct assessment of the authentic practices of writing and citing. A 2015 review (Bradley, 2015) of games with a focus on plagiarism prevention indicates that all ten games focused on MCQ style interaction, rather than more authentic writing tasks. The tools that we are proposing provide learning tasks for scalable use across teaching contexts, for the improvement of information literacy and credibility practices.

Bradley, E. G. (2015). Using Computer Simulations and Games to Prevent Student Plagiarism. Journal of Educational Technology Systems, 44(2), 240–252. Keck, C. (2006). The use of paraphrase in summary writing: A comparison of L1 and L2 writers. Journal of Second Language Writing, 15(4), 261–278. Lea, M. R., & Street, B. V. (2006). The" academic literacies" model: Theory and applications. Theory into Practice, 45(4), 368–377. McGowan, U. (2005). Plagiarism detection and prevention: Are we putting the cart before the horse. HERDSA.

Is your project already in progress?


How is it relevant to credibility and Wikipedia? (max 500 words)

Verifiability is a fundamental principle of Wikipedia, underpinning its credibility as a source. In addition to its role in tertiary education, the tool could be used to support onboarding into Wikipedia citation practices, both in the Education Program context, and via more informal support provided to new editors. The project supports verifiability processes by providing learning support for the addition of citations to Wikipedia.

What is the ultimate impact of this project?

The ultimate goal of the project is to improve scientific and digital literacy by giving students an apprenticeship into Wikipedia’s best practice relating to citations.

Could it scale?

The tool will be designed as a web app, with resources to support education programs around the world in implementing it. The tool will be designed to support adoption by the community.

Why are you the people to do it?

The research team are established researchers with institutional research infrastructure support to design and conduct projects. The team have consulted WMF researcher Miriam Redi in drafting the proposal and plan.

Two of the researchers have relationships with the Wikimedia community. Heather Ford is a previous Wikimedia Foundation Advisory Board member and is an active member of the Wikipedia research community, having completed a series of research projects in collaboration with the Wikimedia Foundation.

Simon Knight is a previous vice chair of Wikimedia UK and has edited on Wikimedia projects. Simon Knight and Shibani Antonette have research and practice expertise in the development and implementation of educational technology, including in the use of natural language processing to support student learning.

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

Onboarding new editors is a challenge for Wikipedia, particularly for editors who are not familiar with scholarly citation practices (including but not limited to those without university education, or in their early years of an undergraduate program). The project provides an onboarding pathway and support for new editors, developing an inclusive environment and encouraging contribution to open knowledge.

What are the challenges associated with this project and how you will overcome them?

Challenge 1: There are challenges associated with technical implementation of the project including integration of datasets and tools, development of an effective user interface, etc. These challenges are controlled by (a) the advanced sourcing of datasets and apis, as described here, (b) drawing on effective expertise in the project (see above). The risks are mitigated by developing a project plan with milestones, providing for adjustment of the plan at stages through the project. Challenge 2: Evaluation of learning projects can be a challenge, with many technologies evaluated through satisfaction surveys rather than for impact on learning. The team has expertise in implementation of learning technologies in context, and their evaluation, and has identified evaluation methods as above. Challenge 3: Recruitment of participants is a potential challenge. This challenge is controlled through our pre-existing relationships with instructors, and experience in deploying instructional interventions integrated into classes, and independent studies with recruitment of students. Challenge 4: Technological innovation may not result in integration into existing practice, in this case integration into the Wikimedia community and broader learning contexts. This challenge is mitigated per above, through design of tools and instructional contexts that may be adopted/adapted. The team have worked with the Wikimedia community previously, and will draw on this knowledge and community contacts.

How much money are you requesting?

10,000 USD

How will you spend the money?

The funding will be used to hire a developer for the core technical tasks involved in building a tool that runs the game and collects data. Other costs in tool development include server and cloud storage costs for hosting the tool on the web. Costs for these phases are ~$7,000. Researcher time during the development stages will be provided in kind, at a commensurate amount to the funded portion.

An evaluation phase involving piloting with students, evaluation of learning (described above), and dissemination of findings will be conducted by the researchers. Costs for this phase are ~$3,000.

Project phases comprise:

  • Identify a set of articles for the project (we will see to consult local instructors, or/and through the WikiEdu project). Articles will be sourced that are of a high quality per https://ores.wikimedia.org/, of a minimum length (available in the metadata query per https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&action=render ), and with a higher level of Open Access citations (to maximise availability of these) per https://figshare.com/articles/dataset/Accessibility_and_topics_of_citations_with_identifiers_in_Wikipedia/6819710 .
  • Use the tools labs to develop an interface for students to access articles and respond to interactive exercises (selecting and marking citations), and store responses and feedback messages.
  • Use the tools labs to develop an interface for students to write articles, store these responses, and feedback messages.
  • Design an evaluation pilot with students. Application to UTS Human Ethics Committee (IRB equivalent), for equivalent of an IRB-exempt project, and piloting with students in existing courses as part of their learning. Serving survey items to these students (pre/post) where appropriate. Analyse results, and disseminate findings.

Costs have been calculated on a UTS professional staff casual scale, for the level of work required for the role. Appointment will be made subject to experience. At this level, funds provide for 151 hours at $51p/h(AUD) for the first 3 bullets. The University is contributing a commensurate amount ‘in kind’ through staff time.

For the evaluation phase, funds provide for ~55 hours of researcher time.

The University applies a 25% infrastructure costing.

How long will your project take?

The project can comprise three discreet tasks that form a work package to be implemented over 3 months, in consultation with the appointed research assistant. Piloting and evaluation will be conducted over the subsequent 6 months.

Have you worked on projects for previous grants before?


Notably, Knight and Shibani have worked on developing automated feedback for student writing: http://heta.io/technologies/ And Ford has worked on a number of Wikimedia related projects per https://hblog.org/grants-and-awards/

For further details, see CVs on websites above.