Submissions:2025/Leveraging Robust Links to Prevent Link Rot on Wikipedia

From WikiConference North America
Jump to navigation Jump to search

This submission has been noted and is pending review for WikiConference North America 2025.



Title:

Leveraging Robust Links to Prevent Link Rot on Wikipedia

Type of session:

Lecture (15-30 min)

Session theme(s):

Credibility

Abstract:

Unlike traditional scholarly publications, web pages and other online resources often suffer from content drift and link rot over time. Consequently, any references citing such resources lose credibility when resolving such references lead to error pages or content that have become unrelated to the context. The inherent problem here is the lack of expression of the temporal dimension when citing a web resource. While some references do expresses the intended date in the text, HTML anchor element does not have a standard way to encode this information in a machine-readable manner.

Robust Links is a proposed standard to bring this capability to anchor elements in the form of HTML5 data-* attributes. We introduce "data-originalurl", "data-versiondate", and "data-versionurl" attributes to express the original URL of the referenced resource, the date (or datetime) of the intended state or version of the resource, and optionally one or more known good archived version URLs at which the resource is preserved in the intended state. Currently, these attributes are not interpreted by user-agents in any special way, but JavaScript can be used to leverage them in the interim.

The current approach of fixing broken links requires running a bot which scans all the wiki pages regularly for their external links, checks the status of those links, and replaces any broken links with their corresponding archived versions from a web archive like Wayback Machine of the Internet Archive. The current approach has numerous inefficiencies and limitations. In this talk we will discuss potential approaches to integrate Robust Links in MediaWiki to make references born resilient against link rot.

Author name(s):

Sawood Alam, Shawn M. Jones, Martin Klein, Michael L. Nelson, Herbert Van de Sompel

Wikimedia username(s):

Affiliated organization(s):

Internet Archive, Los Alamos National Laboratory, Pacific Northwest National Laboratory, Old Dominion University, Data Archiving and Networking Services (DANS)

Estimated length of session

30

Will you be presenting remotely?

I will present in-person

Okay to livestream?

Livestreaming is okay

Previously presented?

Yes, we presented at the WikiCredCon 2025

Special requests: