Submissions:2025/Beyond Wikivecs: How to Use Dense Vectors to Explore and Expand Wikipedia
This submission has been noted and is pending review for WikiConference North America 2025.
Title:
- Beyond Wikivecs: How to Use Dense Vectors to Explore and Expand Wikipedia
Type of session:
- Lecture (15-30 min)
Session theme(s):
- Missing pieces, Future of Wikipedia
Abstract:
Lightning talk from Wikipedia Day NYC 2025:
Abstract:
Dense vector representations have opened up new ways to explore, understand, and improve Wikipedia. In early 2025, we released Wikivecs, the first fully open and reproducible dataset of dense vector embeddings for every article in Multilingual Wikipedia. Built with a permissively licensed multilingual text encoder, the dataset aligned content across languages in a shared vector space, enabling the rapid discovery of several content silos across languages.
This lecture will build on that foundation and focus on how contributors, researchers, and tool developers can use dense vectors to analyze and improve Wikipedia. Participants will learn the basics of working with vector representations, including:
- How to understand and interpret vector representations
- How to understand and interpret data maps
- How to identify missing or inconsistent content across language editions
- How to cluster and visualize articles by topic or conceptual similarity
We’ll also explore future directions for Wikipedia tooling with vectors, from bias detection to recommendation systems. No prior experience with machine learning is required; we’ll walk through everything with practical examples and open-source tools.
Author name(s):
Wikimedia username(s):
Affiliated organization(s):
- Nomic AI, Johns Hopkins University
Estimated length of session
- 30 minutes
Will you be presenting remotely?
- I will present in-person
Okay to livestream?
- Livestreaming is okay
Previously presented?
- I presented an earlier version of this work at Wikipedia Day NYC 2025.
Special requests:
- The dataset referenced in this talk is currently under review as a submission to https://meta.wikimedia.org/wiki/NLP_for_Wikipedia_(ACL_2025)/Call_for_Papers