Submissions:2025/Beyond Link Repair: The Work of InternetArchiveBot and WaybackMedic in Strengthening Wikipedia’s References

From WikiConference North America
Jump to navigation Jump to search

This submission has been noted and is pending review for WikiConference North America 2025.



Title:

Beyond Link Repair: The Work of InternetArchiveBot and WaybackMedic in Strengthening Wikipedia’s References

Type of session:

Lecture (15-30 min)

Session theme(s):

Credibility

Abstract:

Wikipedia relies heavily on external sources to support its content, but link rot continues to threaten the integrity of its references. To address this, InternetArchiveBot (IABot) operates at scale across more than 400 Wikimedia wikis, automatically detecting dead links and attaching corresponding archive URLs, primarily from the Internet Archive’s Wayback Machine. This foundational work ensures that references remain verifiable long after the original source disappears. However, not all domains are equally manageable. WaybackMedic was developed to handle the cases IABot cannot fix automatically—domains with URL restructuring, redirects, soft 404s, bot blockers, and other complex failures. Each domain often requires bespoke handling. For example, usda.gov serves fake 404 responses to headless browsers, requiring a tailored Puppeteer-based script to access real content. Once validated, fixes are applied to all relevant links across Wikimedia projects via IABot. In addition to preserving links, efforts are underway to enhance citations. The Linking Archive.org Media Project (LAMP) adds links to books at archive.org where appropriate, making full texts directly accessible to readers. This work integrates naturally with Wikipedia’s mission to share knowledge, particularly for historical or academic sources. Together, these initiatives—InternetArchiveBot, WaybackMedic, and LAMP—strengthen the reliability, accessibility, and resilience of Wikimedia content. This presentation will outline the roles of each, demonstrate the workflow for tackling difficult domains, and explore how these tools are used in tandem to maintain the health of Wikipedia’s vast reference network.

Author name(s):

Maximilian Doerr, Sawood Alam, GreenC

Wikimedia username(s):

Cyberpower678, GreenC

Affiliated organization(s):

Internet Archive

Estimated length of session

30

Will you be presenting remotely?

I will present in-person

Okay to livestream?

Livestreaming is okay

Previously presented?

Partially, during Katowice when presenting IABot's link fixing milestones.

Special requests:

GreenC will not be present for the presentation, but it's their work and has primarily contributed to this proposal and presentation.