Submissions:2019/4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata

From WikiConference North America
Jump to: navigation, search
This session is part of the WikiCite track.

This submission has been accepted for WikiConference North America 2019.



Title:

4 Million in 4 Weeks: A case study on bulk import of cultural heritage metadata on Wikidata

Theme:

Relationship Building & Support
+ Tech & Tools

Type of session:

Presentation

Abstract:

In August, 2019 alone, the US National Archives added over 4 million statements—and 400,000 new items—to Wikidata. Primarily, the edits were made via a custom Python script using the Pywikibot library, running on PAWS, with some additional usage of QuickStatements batches.

The project was both a technical endeavor—requiring us to develop a method for accessing large amounts of NARA data, transforming it, and posting it to Wikidata—as well as an intellectual one, since not only had NARA data standard not been previously mapped to Wikidata properties, but archival description itself on Wikidata is still in its early stages, and so many of the types of data we were trying to add had never been modeled before.

This presentation will be a deep dive into just how I did it, with thoughts on tools, data modeling, working with data sources, and navigating the community. It will also cover technical challenges and other reflections from the ongoing work, including dealing with large-scale data, gathering metrics and tracking progress, and working on behalf of an institution and getting buy-in from non-technical staff.

Academic Peer Review option:

No

Author name:

Dominic Byrd-McDevitt

E-mail address:

dominic@byrd-mcdevitt.com

Wikimedia username:

Dominic

Affiliated organization(s):

National Archives and Records Administration

Estimated time:

30 minutes

Preferred room size:

Special requests:

Have you presented on this topic previously? If yes, where/when?:

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)

Yes