Submissions:2015/Metadata in the Commons

Title: Metadata in the Commons Today

Theme: GLAM and tech

Type of submission: Presentation

Author: Gaurav Vaidya

E-mail address: gaurav[at]ggvaidya[dot]com

Username: w:User:Gaurav

Affiliation: PhD Candidate, Ecology and Evolutionary Biology, University of Colorado Boulder

Abstract

The Wikimedia Commons, one of the largest multimedia repositories of open-source content in the world, is an obvious connection point for GLAMs and the world of open source: GLAMs would like more people to discover the treasures they keep in their collections, and the open source community would like high-quality images, recordings and metadata to improve, reuse and share with the world. An incentive for GLAMs to share their content is the open source community's ability to improve existing content: paintings and photographs have been restored, scanned documents have been transcribed and incorporated into textbooks, illustrations in scanned materials have been automatically extracted and shared, and metadata can be added and improved. Some GLAMs have turned to crowdsourcing as a way to classify illustrations or transcribe handwritten text. With its deep ties to Wikipedia, probably the best known open source project in the world, the Commons is perfectly placed to act as a intermediary for such content, with GLAMs using metadata-aware uploaders such as the GLAMwiki Toolset to contribute their content to the Commons, and then to develop custom software to download improved content and metadata back from the Commons into their local databases.

In the last two years, I've worked on two projects that work towards this goal. Last year, I undertook a Google Summer of Code project to build the DBpedia Commons, an extension of the DBpedia project to extract RDF-based data from the content associated with files in the Commons. In my presentation, I will describe how DBpedia Commons extracts metadata from the Commons, what sort of metadata it is able to extract, and how to download that for your own use.

Two years ago, I worked with the Biodiversity Heritage Library (BHL) to develop a Commons template for storing the metadata they provide (a precursor to the much better customization available through the GLAMwiki Toolkit), and our plan for setting up a continuous flow of metadata from the BHL to the Commons and back again. I will also describe how I would implement such a system today, using DBpedia's infrastructure and ontologies to create a data model that would let users on the Commons and databases at the BHL speak the same language and share categorical, descriptive and numerical data back and forth.

In the future, the Structured Data project and Commons-Wikidata integration will make such workflows easy to set up, but my presentation will provide information on what GLAMs are trying to do today, describe some software that is already available to set up such workflows, and what additional features the Commons and Structured Data Projects should aim to provide to encourage mutually beneficial metadata collaborations between GLAMs and Wikipedia.

Length of presentation

30 minutes

Special schedule requests

I'd prefer not to present on Friday, since I'd probably fly into DC early on Friday morning

Will you attend WikiConference USA if your submission is not accepted?

Only if I get the scholarship!

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

Add your username here.

Submissions:2015/Metadata in the Commons

Interested attendees

Navigation menu

Search