Submissions:2023/Wikidata Lexemes: Introduction to the possibilities

This submission has been accepted for WikiConference North America 2023.

Etherpad

Title:

Wikidata Lexemes: Introduction to the possibilities

Theme:

Languages, Technology

Type of session:

Workshop

Abstract:

Although the many contributors to Wikidata from the Americas represent a linguistically diverse group, this diversity is not currently reflected in the project's lexicographical data: lots of English, French, and Spanish lexemes remain woefully underdeveloped, and there is an entirely negligible number of lexemes in any language indigenous to either continent. Part of this may be due to the currently inadequate exposure that lexicographical data has to speakers from the Americas—most languages whose lexemes are actively improved are either spoken predominantly in Asia, predominantly in Africa, or are primarily confined to Europe—which potentially degrades any benefits that the upcoming Abstract Wikipedia project might provide to language communities from the Americas. It is hoped this workshop may at least begin to address this imbalance.

This workshop will introduce the creation, improvement, and current uses of Wikidata lexicographical data—that is, lexemes ("items for words") and the inflections and meanings they may take on. Participants will learn, depending on the languages they speak, either to build lexemes representing concepts in their language or to add information—particularly meanings and links from those meanings to Wikidata and other meanings—to existing lexemes should that information be absent. It is hoped that participants gain a basic understanding of what is useful to count as a lexeme and what details lexemes ideally should contain, as a precursor to more significant modeling decisions for their languages, so that those lexemes ultimately benefit other parts of Wikidata and potentially projects outside of it as well.

The workshop will additionally aim to motivate the necessity of lexemes for the upcoming Abstract Wikipedia project. A demonstration of how lexemes (possibly ones created during the workshop!) play a role in the Ninai/Udiron software—taking an abstract representation of a sentence or sentences and producing textual equivalents in different languages—is hoped to express more clearly, especially to speakers of languages most likely to benefit from Abstract Wikipedia (such as those with smaller or non-existent Wikipedias), where their efforts might be focused if they wish to better prepare for that project.

Author name:

Mahir Morshed

E-mail address:

mahir256

live.com

Wikimedia username:

Mahir256

Affiliated organization(s):

Estimated time:

60 min

Special requests:

Have you presented on this topic previously? If yes, where/when?:

previously presented on related topics:

Arctic Knot 2021
LD4 2021
WikidataCon 2021 (from 1:08:07)
WikiIndaba 2021 (from 1:00:23)
Wikimania 2022
Live Wikidata Editing (from 22:25)
Live Wikidata Editing (around Celtic Knot 2022, but not part of it)

Okay to livestream?

Livestreaming is okay

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)

possibly as a presentation (although relevance to the North American audience would not be guaranteed)

Submissions:2023/Wikidata Lexemes: Introduction to the possibilities

Navigation menu

Search