Difference between revisions of "Submissions:2023/A positive feedback loop between Wikidata and text mining"

From WikiConference North America
Jump to navigation Jump to search
(Created page with "{{WCNA 2023 Session Submission |theme=Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred) |type=Lightning talk |abstract=Wiki...")
 
(abstract)
Line 2: Line 2:
 
|theme=Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred)
 
|theme=Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred)
 
|type=Lightning talk
 
|type=Lightning talk
  +
|abstract=Wikidata's items contain information about concepts and relationships between them, structured in a way that is somewhat language-agnostic. Wikidata also contains language-specific information, particularly in the lexeme namespace, which annotates words and phrases and their various forms and functions on a per-language basis. Specific meanings of lexemes can be linked to the corresponding items.
|abstract=Wikidata's information about lexemes and their relationships to items can assist text-mining efforts in principle, yet the workflows for that have room for improvement.
 
   
  +
Text mining workflows often contain steps that try to map strings to linguistic forms and functions and from there to potential meanings.
  +
 
This talk makes the case that Wikidata's information about lexemes and their relationships to items can assist text-mining efforts in principle, yet the workflows for that have room for improvement.
 
Conversely, text mining workflows typically have specific targets, yet they have to process large amounts of text to find these targets. Some of the intermediate results produced on the way may be useful for Wikidata.
 
Conversely, text mining workflows typically have specific targets, yet they have to process large amounts of text to find these targets. Some of the intermediate results produced on the way may be useful for Wikidata.
   
This talk will sketch out some workflows that can connect Wikidata and mining workflows in both directions and highlight some examples.
+
The talk will sketch out some workflows that can connect Wikidata and mining workflows in both directions and highlight some examples.
 
|author=Daniel Mietchen
 
|author=Daniel Mietchen
 
|email=daniel.mietchen{{@}}wikipedia.de
 
|email=daniel.mietchen{{@}}wikipedia.de

Revision as of 23:11, 23 July 2023

This submission has been noted and is pending review for WikiConference North America 2023.



Title:

A positive feedback loop between Wikidata and text mining

Theme:

Languages, Research / Science / Medicine, Technology, Credibility / Mis and Disinformation (WikiCred)

Type of session:

Lightning talk

Abstract:

Wikidata's items contain information about concepts and relationships between them, structured in a way that is somewhat language-agnostic. Wikidata also contains language-specific information, particularly in the lexeme namespace, which annotates words and phrases and their various forms and functions on a per-language basis. Specific meanings of lexemes can be linked to the corresponding items.

Text mining workflows often contain steps that try to map strings to linguistic forms and functions and from there to potential meanings.

This talk makes the case that Wikidata's information about lexemes and their relationships to items can assist text-mining efforts in principle, yet the workflows for that have room for improvement. Conversely, text mining workflows typically have specific targets, yet they have to process large amounts of text to find these targets. Some of the intermediate results produced on the way may be useful for Wikidata.

The talk will sketch out some workflows that can connect Wikidata and mining workflows in both directions and highlight some examples.

Author name:

Daniel Mietchen

E-mail address:

daniel.mietchen@wikipedia.de

Wikimedia username:

Daniel Mietchen

Affiliated organization(s):

Estimated time:

7 min

Special requests:

Have you presented on this topic previously? If yes, where/when?:

No

Okay to livestream?

Livestreaming is okay

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)

Yes