Submissions:2023/A positive feedback loop between Wikidata and text mining
- A positive feedback loop between Wikidata and text mining
- Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred)
Type of session:
- Lightning talk
Wikidata's items contain information about concepts and relationships between them, structured in a way that is somewhat language-agnostic. Wikidata also contains language-specific information, particularly in the lexeme namespace, which annotates words and phrases and their various forms and functions on a per-language basis. Specific meanings of lexemes can be linked to the corresponding items.
Text mining workflows often contain steps that try to map strings to linguistic forms and functions and from there to potential meanings.
This talk makes the case that Wikidata's information about lexemes and their relationships to items can assist text-mining efforts in principle, yet the workflows for that have room for improvement. Conversely, text mining workflows typically have specific targets, yet they have to process large amounts of text to find these targets. Some of the intermediate results produced on the way may be useful for Wikidata.
The talk will sketch out some workflows that can connect Wikidata and mining workflows in both directions and highlight some examples.
- Slides: here
- Daniel Mietchen
- 15 min including discussion
Have you presented on this topic previously? If yes, where/when?:
Okay to livestream?
- Livestreaming is okay
If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)