Submissions:2024/How We Rescued 20 MILLION Dead Wikipedia URLs: An Internet Archive Update

From WikiConference North America
Jump to navigation Jump to search

This submission has been noted and is pending review for WikiConference North America 2024.


How We Rescued 20 MILLION Dead Wikipedia URLs: An Internet Archive Update

Type of session:

Lecture (15-30 min)

Session theme(s):

Open Data, Partnerships, Reliable Sources, Technology


As one of the most visited websites globally, Wikipedia's reliability and comprehensiveness hinge on the accessibility of its references. Over time, many of these references succumb to link rot, compromising the integrity of the content.

The Internet Archive's Wayback Machine has emerged as a crucial ally in the preservation of digital knowledge, particularly through its efforts in rescuing over 20 million broken links on Wikipedia. This presentation provides a major update on the collaborative endeavor between the Internet Archive and the Wikipedia community to address this large, complex, multi-faceted issue.

By utilizing advanced web crawling and archiving technologies, the Wayback Machine captures and stores snapshots of web pages, ensuring that even if the original page is no longer available, a preserved version can still be accessed. We will discuss the technical methodologies employed, including automated bots that detect and replace dead links with archived versions, as well as the manual efforts by volunteers and editors.

Attendees will gain an understanding of the challenges posed by digital decay and the innovative solutions that make large-scale digital preservation feasible. This presentation will highlight the broader implications for digital heritage and underscore the importance of ongoing efforts to maintain the robustness and reliability of online information.

Author name(s):

Mark Graham

Wikimedia username(s):


E-mail address:

Affiliated organization(s):

Internet Archive

Able to attend without scholarship?


Estimated length of session


Will you be presenting remotely?

Okay to livestream?

Livestreaming is okay

Previously presented?

Special requests: