Submissions:2014/Reconstructing the past with Mediawiki: Programmatic Issues and Solutions


 * Title of the submission: Reconstructing the past with Mediawiki: Programmatic Issues and Solutions


 * Themes (Proposal Themes - Community, Tech, Outreach, GLAM, Education): Tech


 * Type of submission (Presentation Types - Panel, Workshop, Presentation, etc): Curated Talk


 * Author of the submission: Shawn M. Jones


 * E-mail address: sjone@cs.odu.edu


 * Username: Shawnmjones


 * US state or country of origin: Virginia


 * Affiliation, if any (organization, company etc.): Old Dominion University


 * Personal homepage or blog: http://ws-dl.blogspot.com

The Internet Archive attempts to reconstruct web pages via snapshots (Mementos) that are taken of pages at various points in time. Many pages change more frequently than the Internet Archive can capture them, meaning that some revisions of a given web page are lost forever. Mediawiki, however, has all past revisions of a given page, and also its associated external resources. This inspired the development of the Memento Mediawiki Extension as an improvement over the Internet Archive's "drive by" method of digital preservation where Mediawiki sites are involved.
 * Abstract (at least 300 words to describe your proposal):

While working on the Memento Mediawiki Extension, effort was put into reconstructing past revisions of each Wiki page. The existing software reconstructs the page text as per RFC 7089, but does not try to pull in past versions of images, JavaScript, CSS, and other external resources, because Mediawiki, as it exists, makes it difficult or impossible to load these resources at page generation time.

This curated talk will explore the problems of page reconstruction on the main web and detail the issues within the Mediawiki code that currently prevent and/or make it difficult to reconstruct the page in its totality as it looked at that revision. We will contrast how Mediawiki pages use the oldid method for accessing old pages while images and other uploaded content cannot be accessed this same way, even though the uploaded content's summary page does use the oldid method. We will examine how old revisions of the CSS and JavaScript can be viewed by humans, but not retrieved inline when needed by the Mediawiki. We will also indicate how Semantic Mediawiki lacks support for accessing past revisions altogether. After that, we will bring up some techniques that may solve these problems, including a uniform approach that may address all of them.

Where applicable, code examples and demonstrations will be provided.


 * Length of presentation/talk (see Presentation Types for lengths of different presentation types): 15 Minutes


 * Will you attend WikiConference USA if your submission is not accepted?: Yes


 * Slides or further information (optional):
 * Download page for this extension
 * Slides on Slideshare


 * Special request as to time of presentations: Any date/time is fine

Interested attendees
'''If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. ( ~ ).'''


 * 1) Rhododendrites (talk) --09:34, 10 April 2014 (EDT)
 * 2) Add your username here.