Edit 2024/How We Rescued 20 MILLION Dead Wikipedia URLs: An Internet Archive Update: 2024/Main Page

Jump to navigation Jump to search
You do not have permission to edit this page, for the following reason:

The action you have requested is limited to users in the group: Users.


Warning: This page already exists, but it does not use this form.

This submission has been accepted for WikiConference North America 2024.



Title:

Main Page

Type of session:

Lecture (15-30 min)

Session theme(s):

Open Data, Partnerships, Reliable Sources, Technology

Abstract:

As one of the most visited websites globally, Wikipedia's reliability and comprehensiveness hinge on the accessibility of its references. Over time, many of these references succumb to link rot, compromising the integrity of the content.

The Internet Archive's Wayback Machine has emerged as a crucial ally in the preservation of digital knowledge, particularly through its efforts in rescuing over 20 million broken links on Wikipedia. This presentation provides a major update on the collaborative endeavor between the Internet Archive and the Wikipedia community to address this large, complex, multi-faceted issue.

By utilizing advanced web crawling and archiving technologies, the Wayback Machine captures and stores snapshots of web pages, ensuring that even if the original page is no longer available, a preserved version can still be accessed. We will discuss the technical methodologies employed, including automated bots that detect and replace dead links with archived versions, as well as the manual efforts by volunteers and editors.

Attendees will gain an understanding of the challenges posed by digital decay and the innovative solutions that make large-scale digital preservation feasible. This presentation will highlight the broader implications for digital heritage and underscore the importance of ongoing efforts to maintain the robustness and reliability of online information.

Author name(s):

Mark Graham

Wikimedia username(s):

User:Markjgraham_hmb

E-mail address:

mark@archive.org

Affiliated organization(s):

Internet Archive

Estimated length of session

30

Will you be presenting remotely?

Okay to livestream?

Livestreaming is okay

Previously presented?

Special requests:



Cancel