Difference between revisions of "Submissions:2019/4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata"
(Created page with "{{WCNA 2019 Session Submission |theme=Relationship Building & Support<br />+ Tech & Tools<br /> |type=Presentation |abstract=In August, 2019 alone, the US National Archives ad...") |
(<center>{{tag|This session is part of the WikiCite track.}}</center>) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | <center>{{tag|This session is part of the [[Submissions:2019/WikiCite track|WikiCite track]].}}</center> |
||
{{WCNA 2019 Session Submission |
{{WCNA 2019 Session Submission |
||
+ | |status=Accepted |
||
|theme=Relationship Building & Support<br />+ Tech & Tools<br /> |
|theme=Relationship Building & Support<br />+ Tech & Tools<br /> |
||
|type=Presentation |
|type=Presentation |
Latest revision as of 01:27, 27 October 2019
This submission has been accepted for WikiConference North America 2019.
Title:
- 4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata
Theme:
- Relationship Building & Support
+ Tech & Tools
Type of session:
- Presentation
Abstract:
In August, 2019 alone, the US National Archives added over 4 million statements—and 400,000 new items—to Wikidata. Primarily, the edits were made via a custom Python script using the Pywikibot library, running on PAWS, with some additional usage of QuickStatements batches.
The project was both a technical endeavor—requiring us to develop a method for accessing large amounts of NARA data, transforming it, and posting it to Wikidata—as well as an intellectual one, since not only had NARA data standard not been previously mapped to Wikidata properties, but archival description itself on Wikidata is still in its early stages, and so many of the types of data we were trying to add had never been modeled before.
This presentation will be a deep dive into just how I did it, with thoughts on tools, data modeling, working with data sources, and navigating the community. It will also cover technical challenges and other reflections from the ongoing work, including dealing with large-scale data, gathering metrics and tracking progress, and working on behalf of an institution and getting buy-in from non-technical staff.
Academic Peer Review option:
- No
Author name:
E-mail address:
- dominic@byrd-mcdevitt.com
Wikimedia username:
- Dominic
Affiliated organization(s):
- National Archives and Records Administration
Estimated time:
- 30 minutes
Preferred room size:
Special requests:
Have you presented on this topic previously? If yes, where/when?:
If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)
- Yes