Submissions:2019/4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata
This submission has been noted and is pending review for WikiConference North America 2019.
Title:
- 4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata
Theme:
- Relationship Building & Support
+ Tech & Tools
Type of session:
- Presentation
Abstract:
In August, 2019 alone, the US National Archives added over 4 million statements—and 400,000 new items—to Wikidata. Primarily, the edits were made via a custom Python script using the Pywikibot library, running on PAWS, with some additional usage of QuickStatements batches.
The project was both a technical endeavor—requiring us to develop a method for accessing large amounts of NARA data, transforming it, and posting it to Wikidata—as well as an intellectual one, since not only had NARA data standard not been previously mapped to Wikidata properties, but archival description itself on Wikidata is still in its early stages, and so many of the types of data we were trying to add had never been modeled before.
This presentation will be a deep dive into just how I did it, with thoughts on tools, data modeling, working with data sources, and navigating the community. It will also cover technical challenges and other reflections from the ongoing work, including dealing with large-scale data, gathering metrics and tracking progress, and working on behalf of an institution and getting buy-in from non-technical staff.
Academic Peer Review option:
- No
Author name:
E-mail address:
- dominic@byrd-mcdevitt.com
Wikimedia username:
- Dominic
Affiliated organization(s):
- National Archives and Records Administration
Estimated time:
- 30 minutes
Preferred room size:
Special requests:
Have you presented on this topic previously? If yes, where/when?:
If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)
- Yes