Difference between revisions of "Submissions:2015/Metadata in the Commons"

From WikiConference North America
Jump to navigation Jump to search
(Added self)
 
(5 intermediate revisions by 5 users not shown)
Line 1: Line 1:
 
<!-- Simply provide information about your submission below and save the page. -->
 
<!-- Simply provide information about your submission below and save the page. -->
;Title: Metadata in the Commons Today
+
;Title: Metadata from the Commons
   
 
;[[Submissions#Proposal Themes|Theme]]: GLAM and tech <!-- community, tech, outreach, GLAM, or education -->
 
;[[Submissions#Proposal Themes|Theme]]: GLAM and tech <!-- community, tech, outreach, GLAM, or education -->
Line 16: Line 16:
 
;Abstract: <!-- at least 300 words to describe your proposal -->
 
;Abstract: <!-- at least 300 words to describe your proposal -->
   
  +
[[File:Beardedtit46.jpg|right|thumb|What is in this picture? [http://commons.dbpedia.org/page/File:Beardedtit46.jpg DBpedia knows] that it's an image of ''Panurus biarmicus'' created by John Gerrard Keulemans in 1897.]]
The Wikimedia Commons, one of the largest multimedia repositories of open-source content in the world, is an obvious connection point for [[w:GLAM (industry sector)|GLAMs]] and the world of open source: GLAMs would like more people to discover the treasures they keep in their collections, and the open source community would like high-quality images, recordings and metadata to improve, reuse and share with the world. An incentive for GLAMs to share their content is the open source community's ability to improve existing content: paintings and photographs [[commons:Commons:Media restoration|have been restored]], scanned documents have been [[w:Wikisource|transcribed]] and [[w:Wikibooks|incorporated into textbooks]], illustrations in scanned materials have been [https://blog.archive.org/2014/08/29/millions-of-historic-images-posted-to-flickr/ automatically extracted and shared], and metadata can be [[wikibooks:Crowdsourcing/Restoration_and_reuse_of_images/Improving_image_metadata|added and improved]]. Some GLAMs have turned to crowdsourcing as a way to [http://blog.biodiversitylibrary.org/2014/11/crowdsourcing-and-bhl-current-projects.html classify illustrations] or [http://www.notesfromnature.org/ transcribe handwritten text]. With its deep ties to Wikipedia, probably the best known open source project in the world, the Commons is perfectly placed to act as a intermediary for such content, with GLAMs using metadata-aware uploaders such as the [[commons:Commons:GLAMwiki Toolset Project|GLAMwiki Toolset]] to contribute their content to the Commons, and then to develop custom software to download improved content and metadata back from the Commons into their local databases.
 
  +
 
The Wikimedia Commons, one of the largest multimedia repositories of open-source content in the world, is an obvious connection point for [[w:GLAM (industry sector)|GLAMs]] and the world of open source: GLAMs would like more people to discover the treasures they keep in their collections, and the open source community would like high-quality images, recordings and metadata to improve, reuse and share with the world. An incentive for GLAMs to share their content is the open source community's ability to improve existing content: paintings and photographs [[commons:Commons:Media restoration|have been restored]], scanned documents have been [[w:Wikisource|transcribed]] and [[w:Wikibooks|incorporated into textbooks]], illustrations in scanned materials have been [https://blog.archive.org/2014/08/29/millions-of-historic-images-posted-to-flickr/ automatically extracted and shared], and metadata can be [[wikibooks:Crowdsourcing/Restoration_and_reuse_of_images/Improving_image_metadata|added and improved]]. Some GLAMs have turned to crowdsourcing as a way to [http://blog.biodiversitylibrary.org/2014/11/crowdsourcing-and-bhl-current-projects.html classify illustrations] or [http://www.notesfromnature.org/ transcribe handwritten text]. With its deep ties to Wikipedia, probably the best known open-access project in the world, the Commons is perfectly placed to act as a intermediary for such content, with GLAMs using metadata-aware uploaders such as the [[commons:Commons:GLAMwiki Toolset Project|GLAMwiki Toolset]] to contribute their content to the Commons, and then to develop custom software to download improved content and metadata back from the Commons into their local databases.
   
 
In the last two years, I've worked on two projects that work towards this goal. Last year, I undertook a Google Summer of Code project to build [[commons:User:Gaurav/DBpedia|the DBpedia Commons]], an extension of the DBpedia project to extract RDF-based data from the content associated with files in the Commons. In my presentation, I will describe how DBpedia Commons extracts metadata from the Commons, what sort of metadata it is able to extract, and how to download that for your own use.
 
In the last two years, I've worked on two projects that work towards this goal. Last year, I undertook a Google Summer of Code project to build [[commons:User:Gaurav/DBpedia|the DBpedia Commons]], an extension of the DBpedia project to extract RDF-based data from the content associated with files in the Commons. In my presentation, I will describe how DBpedia Commons extracts metadata from the Commons, what sort of metadata it is able to extract, and how to download that for your own use.
   
Two years ago, I worked with the [[w:Biodiversity Heritage Library|Biodiversity Heritage Library]] (BHL) to develop a Commons template for storing the metadata they provide (a precursor to the much better customization available through the GLAMwiki Toolkit), and our plan for setting up a continuous flow of metadata from the BHL to the Commons and back again. I will also describe how I would implement such a system today, using DBpedia's infrastructure and ontologies to create a data model that would let users on the Commons and databases at the BHL speak the same language and share categorical, descriptive and numerical data back and forth.
+
Two years ago, I worked with the [[w:Biodiversity Heritage Library|Biodiversity Heritage Library]] (BHL) to develop a Commons template for storing the metadata they provide (a precursor to the much better customization available through the GLAMwiki Toolkit), and our plan for setting up a continuous flow of metadata from the BHL to the Commons and back again. I will also describe how I would implement such a system today, using DBpedia's infrastructure and ontologies to create a data model that would let users on the Commons and BHL's databases speak the same language and share categorical, descriptive and numerical data back and forth.
   
 
In the future, the [[commons:Commons:Structured_data|Structured Data project]] and [[commons:Commons:Wikidata|Commons-Wikidata integration]] will make such workflows easy to set up, but my presentation will provide information on what GLAMs are trying to do today, describe some software that is already available to set up such workflows, and what additional features the Commons and Structured Data Projects should aim to provide to encourage mutually beneficial metadata collaborations between GLAMs and Wikipedia.
 
In the future, the [[commons:Commons:Structured_data|Structured Data project]] and [[commons:Commons:Wikidata|Commons-Wikidata integration]] will make such workflows easy to set up, but my presentation will provide information on what GLAMs are trying to do today, describe some software that is already available to set up such workflows, and what additional features the Commons and Structured Data Projects should aim to provide to encourage mutually beneficial metadata collaborations between GLAMs and Wikipedia.
Line 42: Line 44:
   
 
# [[User:Emw|Emw]] ([[User talk:Emw|talk]]) 11:28, 30 August 2015 (EDT)
 
# [[User:Emw|Emw]] ([[User talk:Emw|talk]]) 11:28, 30 August 2015 (EDT)
  +
# [[User:Rhododendrites|Rhododendrites]] ([[User talk:Rhododendrites|talk]]) 15:42, 30 August 2015 (EDT)
  +
# --[[User:Frank Schulenburg|Frank Schulenburg]] ([[User talk:Frank Schulenburg|talk]]) 22:19, 31 August 2015 (EDT)
  +
# [[User:Gamaliel|Gamaliel]] ([[User talk:Gamaliel|talk]]) 21:41, 4 September 2015 (EDT)
  +
# [[User:Kosboot|Kosboot]] ([[User talk:Kosboot|talk]]) 10:54, 4 October 2015 (EDT)
 
# ''Add your username here.''
 
# ''Add your username here.''
   
 
[[Category:Submissions/2015]]
 
[[Category:Submissions/2015]]
  +
[[Category:Submissions in 2015, Wikimedia Commons]]
  +
[[Category:Submissions in 2015, Wikidata]]
  +
[[Category:Submissions in 2015, GLAM]]

Latest revision as of 14:54, 4 October 2015

Title
Metadata from the Commons
Theme
GLAM and tech
Type of submission
Presentation
Author
Gaurav Vaidya
E-mail address
gaurav[at]ggvaidya[dot]com
Username
w:User:Gaurav
Affiliation
PhD Candidate, Ecology and Evolutionary Biology, University of Colorado Boulder
Abstract
What is in this picture? DBpedia knows that it's an image of Panurus biarmicus created by John Gerrard Keulemans in 1897.

The Wikimedia Commons, one of the largest multimedia repositories of open-source content in the world, is an obvious connection point for GLAMs and the world of open source: GLAMs would like more people to discover the treasures they keep in their collections, and the open source community would like high-quality images, recordings and metadata to improve, reuse and share with the world. An incentive for GLAMs to share their content is the open source community's ability to improve existing content: paintings and photographs have been restored, scanned documents have been transcribed and incorporated into textbooks, illustrations in scanned materials have been automatically extracted and shared, and metadata can be added and improved. Some GLAMs have turned to crowdsourcing as a way to classify illustrations or transcribe handwritten text. With its deep ties to Wikipedia, probably the best known open-access project in the world, the Commons is perfectly placed to act as a intermediary for such content, with GLAMs using metadata-aware uploaders such as the GLAMwiki Toolset to contribute their content to the Commons, and then to develop custom software to download improved content and metadata back from the Commons into their local databases.

In the last two years, I've worked on two projects that work towards this goal. Last year, I undertook a Google Summer of Code project to build the DBpedia Commons, an extension of the DBpedia project to extract RDF-based data from the content associated with files in the Commons. In my presentation, I will describe how DBpedia Commons extracts metadata from the Commons, what sort of metadata it is able to extract, and how to download that for your own use.

Two years ago, I worked with the Biodiversity Heritage Library (BHL) to develop a Commons template for storing the metadata they provide (a precursor to the much better customization available through the GLAMwiki Toolkit), and our plan for setting up a continuous flow of metadata from the BHL to the Commons and back again. I will also describe how I would implement such a system today, using DBpedia's infrastructure and ontologies to create a data model that would let users on the Commons and BHL's databases speak the same language and share categorical, descriptive and numerical data back and forth.

In the future, the Structured Data project and Commons-Wikidata integration will make such workflows easy to set up, but my presentation will provide information on what GLAMs are trying to do today, describe some software that is already available to set up such workflows, and what additional features the Commons and Structured Data Projects should aim to provide to encourage mutually beneficial metadata collaborations between GLAMs and Wikipedia.

Length of presentation

30 minutes

Special schedule requests

I'd prefer not to present on Friday, since I'd probably fly into DC early on Friday morning

Will you attend WikiConference USA if your submission is not accepted?

Only if I get the scholarship!

Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Emw (talk) 11:28, 30 August 2015 (EDT)
  2. Rhododendrites (talk) 15:42, 30 August 2015 (EDT)
  3. --Frank Schulenburg (talk) 22:19, 31 August 2015 (EDT)
  4. Gamaliel (talk) 21:41, 4 September 2015 (EDT)
  5. Kosboot (talk) 10:54, 4 October 2015 (EDT)
  6. Add your username here.