Submissions:2015/Metadata in the Commons

From WikiConference North America
Revision as of 02:26, 1 September 2015 by Gaurav (talk | contribs) (Tweaked the text and add an illustration.)

Jump to: navigation, search
Title
Metadata from the Commons
Theme
GLAM and tech
Type of submission
Presentation
Author
Gaurav Vaidya
E-mail address
gaurav[at]ggvaidya[dot]com
Username
w:User:Gaurav
Affiliation
PhD Candidate, Ecology and Evolutionary Biology, University of Colorado Boulder
Abstract
What is in this picture? DBpedia knows that it's an image of Panurus biarmicus created by John Gerrard Keulemans in 1897.

The Wikimedia Commons, one of the largest multimedia repositories of open-source content in the world, is an obvious connection point for GLAMs and the world of open source: GLAMs would like more people to discover the treasures they keep in their collections, and the open source community would like high-quality images, recordings and metadata to improve, reuse and share with the world. An incentive for GLAMs to share their content is the open source community's ability to improve existing content: paintings and photographs have been restored, scanned documents have been transcribed and incorporated into textbooks, illustrations in scanned materials have been automatically extracted and shared, and metadata can be added and improved. Some GLAMs have turned to crowdsourcing as a way to classify illustrations or transcribe handwritten text. With its deep ties to Wikipedia, probably the best known open-access project in the world, the Commons is perfectly placed to act as a intermediary for such content, with GLAMs using metadata-aware uploaders such as the GLAMwiki Toolset to contribute their content to the Commons, and then to develop custom software to download improved content and metadata back from the Commons into their local databases.

In the last two years, I've worked on two projects that work towards this goal. Last year, I undertook a Google Summer of Code project to build the DBpedia Commons, an extension of the DBpedia project to extract RDF-based data from the content associated with files in the Commons. In my presentation, I will describe how DBpedia Commons extracts metadata from the Commons, what sort of metadata it is able to extract, and how to download that for your own use.

Two years ago, I worked with the Biodiversity Heritage Library (BHL) to develop a Commons template for storing the metadata they provide (a precursor to the much better customization available through the GLAMwiki Toolkit), and our plan for setting up a continuous flow of metadata from the BHL to the Commons and back again. I will also describe how I would implement such a system today, using DBpedia's infrastructure and ontologies to create a data model that would let users on the Commons and BHL's databases speak the same language and share categorical, descriptive and numerical data back and forth.

In the future, the Structured Data project and Commons-Wikidata integration will make such workflows easy to set up, but my presentation will provide information on what GLAMs are trying to do today, describe some software that is already available to set up such workflows, and what additional features the Commons and Structured Data Projects should aim to provide to encourage mutually beneficial metadata collaborations between GLAMs and Wikipedia.

Length of presentation

30 minutes

Special schedule requests

I'd prefer not to present on Friday, since I'd probably fly into DC early on Friday morning

Will you attend WikiConference USA if your submission is not accepted?

Only if I get the scholarship!

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Emw (talk) 11:28, 30 August 2015 (EDT)
  2. Rhododendrites (talk) 15:42, 30 August 2015 (EDT)
  3. --Frank Schulenburg (talk) 22:19, 31 August 2015 (EDT)
  4. Add your username here.