Submissions:2023/A positive feedback loop between Wikidata and text mining

From WikiConference North America
Jump to navigation Jump to search

This submission has been noted and is pending review for WikiConference North America 2023.


A positive feedback loop between Wikidata and text mining


Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred)

Type of session:

Lightning talk


Wikidata's items contain information about concepts and relationships between them, structured in a way that is somewhat language-agnostic. Wikidata also contains language-specific information, particularly in the lexeme namespace, which annotates words and phrases and their various forms and functions on a per-language basis. Specific meanings of lexemes can be linked to the corresponding items.

Text mining workflows often contain steps that try to map strings to linguistic forms and functions and from there to potential meanings.

This talk makes the case that Wikidata's information about lexemes and their relationships to items can assist text-mining efforts in principle, yet the workflows for that have room for improvement. Conversely, text mining workflows typically have specific targets, yet they have to process large amounts of text to find these targets. Some of the intermediate results produced on the way may be useful for Wikidata.

The talk will sketch out some workflows that can connect Wikidata and mining workflows in both directions and highlight some examples.

Author name:

Daniel Mietchen

E-mail address:

Wikimedia username:

Daniel Mietchen

Affiliated organization(s):

Estimated time:

7 min

Special requests:

Have you presented on this topic previously? If yes, where/when?:


Okay to livestream?

Livestreaming is okay

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)