Submissions:2025/Beyond Link Repair: The Work of InternetArchiveBot and WaybackMedic in Strengthening Wikipedia’s References
This submission has been noted and is pending review for WikiConference North America 2025.
Title:
- Beyond Link Repair: The Work of InternetArchiveBot and WaybackMedic in Strengthening Wikipedia’s References
Type of session:
- Lecture (15-30 min)
Session theme(s):
- Credibility
Abstract:
Wikipedia relies heavily on external sources to support its content, but link rot continues to threaten the integrity of its references. To address this, InternetArchiveBot (IABot) operates at scale across more than 400 Wikimedia wikis, automatically detecting dead links and attaching corresponding archive URLs, primarily from the Internet Archive’s Wayback Machine. This foundational work ensures that references remain verifiable long after the original source disappears. However, not all domains are equally manageable. WaybackMedic was developed to handle the cases IABot cannot fix automatically—domains with URL restructuring, redirects, soft 404s, bot blockers, and other complex failures. Each domain often requires bespoke handling. For example, usda.gov serves fake 404 responses to headless browsers, requiring a tailored Puppeteer-based script to access real content. Once validated, fixes are applied to all relevant links across Wikimedia projects via IABot. In addition to preserving links, efforts are underway to enhance citations. The Linking Archive.org Media Project (LAMP) adds links to books at archive.org where appropriate, making full texts directly accessible to readers. This work integrates naturally with Wikipedia’s mission to share knowledge, particularly for historical or academic sources. Together, these initiatives—InternetArchiveBot, WaybackMedic, and LAMP—strengthen the reliability, accessibility, and resilience of Wikimedia content. This presentation will outline the roles of each, demonstrate the workflow for tackling difficult domains, and explore how these tools are used in tandem to maintain the health of Wikipedia’s vast reference network.
Author name(s):
Wikimedia username(s):
- Cyberpower678, GreenC
Affiliated organization(s):
- Internet Archive
Estimated length of session
- 30
Will you be presenting remotely?
- I will present in-person
Okay to livestream?
- Livestreaming is okay
Previously presented?
- Partially, during Katowice when presenting IABot's link fixing milestones.
Special requests:
- GreenC will not be present for the presentation, but it's their work and has primarily contributed to this proposal and presentation.