Submissions:2023/Wikidata Lexemes: Introduction to the possibilities

From WikiConference North America
Revision as of 21:35, 6 October 2023 by Peaceray (talk | contribs) (|status=Accepted)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This submission has been accepted for WikiConference North America 2023.


Wikidata Lexemes: Introduction to the possibilities


Languages, Technology

Type of session:



Although the many contributors to Wikidata from the Americas represent a linguistically diverse group, this diversity is not currently reflected in the project's lexicographical data: lots of English, French, and Spanish lexemes remain woefully underdeveloped, and there is an entirely negligible number of lexemes in any language indigenous to either continent. Part of this may be due to the currently inadequate exposure that lexicographical data has to speakers from the Americas—most languages whose lexemes are actively improved are either spoken predominantly in Asia, predominantly in Africa, or are primarily confined to Europe—which potentially degrades any benefits that the upcoming Abstract Wikipedia project might provide to language communities from the Americas. It is hoped this workshop may at least begin to address this imbalance.

This workshop will introduce the creation, improvement, and current uses of Wikidata lexicographical data—that is, lexemes ("items for words") and the inflections and meanings they may take on. Participants will learn, depending on the languages they speak, either to build lexemes representing concepts in their language or to add information—particularly meanings and links from those meanings to Wikidata and other meanings—to existing lexemes should that information be absent. It is hoped that participants gain a basic understanding of what is useful to count as a lexeme and what details lexemes ideally should contain, as a precursor to more significant modeling decisions for their languages, so that those lexemes ultimately benefit other parts of Wikidata and potentially projects outside of it as well.

The workshop will additionally aim to motivate the necessity of lexemes for the upcoming Abstract Wikipedia project. A demonstration of how lexemes (possibly ones created during the workshop!) play a role in the Ninai/Udiron software—taking an abstract representation of a sentence or sentences and producing textual equivalents in different languages—is hoped to express more clearly, especially to speakers of languages most likely to benefit from Abstract Wikipedia (such as those with smaller or non-existent Wikipedias), where their efforts might be focused if they wish to better prepare for that project.

Author name:

Mahir Morshed

E-mail address:

Wikimedia username:


Affiliated organization(s):

Estimated time:

60 min

Special requests:

Have you presented on this topic previously? If yes, where/when?:

previously presented on related topics:

Okay to livestream?

Livestreaming is okay

If your submission is not accepted, would you be open to presenting your topic in another part of the program? (e.g. lightning talk or unconference session)

possibly as a presentation (although relevance to the North American audience would not be guaranteed)