Submissions:2019/4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata

From WikiConference North America
Revision as of 18:51, 22 September 2019 by Dominic (talk | contribs) (Created page with "{{WCNA 2019 Session Submission |theme=Relationship Building & Support<br />+ Tech & Tools<br /> |type=Presentation |abstract=In August, 2019 alone, the US National Archives ad...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This submission has been noted and is pending review for WikiConference North America 2019.


4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata


Relationship Building & Support
+ Tech & Tools

Type of session:



In August, 2019 alone, the US National Archives added over 4 million statements—and 400,000 new items—to Wikidata. Primarily, the edits were made via a custom Python script using the Pywikibot library, running on PAWS, with some additional usage of QuickStatements batches.

The project was both a technical endeavor—requiring us to develop a method for accessing large amounts of NARA data, transforming it, and posting it to Wikidata—as well as an intellectual one, since not only had NARA data standard not been previously mapped to Wikidata properties, but archival description itself on Wikidata is still in its early stages, and so many of the types of data we were trying to add had never been modeled before.

This presentation will be a deep dive into just how I did it, with thoughts on tools, data modeling, working with data sources, and navigating the community. It will also cover technical challenges and other reflections from the ongoing work, including dealing with large-scale data, gathering metrics and tracking progress, and working on behalf of an institution and getting buy-in from non-technical staff.

Academic Peer Review option:


Author name:

Dominic Byrd-McDevitt

E-mail address:

Wikimedia username:


Affiliated organization(s):

National Archives and Records Administration

Estimated time:

30 minutes

Preferred room size:

Special requests:

Have you presented on this topic previously? If yes, where/when?:

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)