Submissions:2018/Supporting Search in Many Languages

Supporting Search in Many Languages
 * Title:

Tech & Tools
 * Theme (optional):

No
 * Academic Peer Review option:

Presentation
 * Type of submission:

Trey Jones
 * Author:

trey@wikimedia.org
 * E-mail address:

TJones (WMF)
 * Wikimedia username:

Wikimedia Foundation
 * Affiliation(s) (optional):


 * Abstract:

The Wikimedia Foundation Search Platform Team supports search in over 250 languages—admittedly with varying degrees of “support”. We have to work within the resource limitations of a small team and balance the absolute impact of improving search on big wikis in well-supported languages against the relative impact of improving search on smaller wikis in less well-supported languages.

Many aspects of English make searching in English relatively easy, and every language that differs from English in an interesting way suffers for it when it comes to search. We'll start with a basic overview of the components of search, a quick survey of some of the features of other languages that make search even more challenging, a review of the techniques used to meet those challenges, and then maybe we'll commiserate a little bit over a few problems that are more difficult.

We’ll also review some useful semi-automated techniques for evaluating changes to language processing—particularly in languages no one on our team is familiar with—including heuristics for highlighting problem areas, and good ways to leverage limited assistance from native speakers who are not search experts.

We’ve done a lot to improve search on various wikis over the last several years, so we’ll touch on a few other interesting search projects—from the biggest to the smallest—that our team has worked on, time permitting.

Finally, we’ll have a short Q&A period, during which people are encouraged to tell me why search sucks for them in their favorite language or on their favorite wiki. I can’t promise everything (or even anything) will get fixed quickly (or ever)—how’s that for a disclaimer!—but I do love learning about and working on interesting language-related search problems!

30-45 minutes
 * Length of presentation:

None
 * Special requests:

~25
 * Preferred room size:

No
 * Have you presented on this topic previously? If yes, where/when?:

Yes
 * If you will be incorporating a slidedeck during your presentation, do you agree to upload it to Commons before your session, with a CC-BY-SA 4.0 license, including suitable attribution in the slidedeck for any images used?:

Yes
 * Will you attend WikiConference North America if your submission is not accepted?:

Interested attendees
'''If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. ( ~ ).'''


 * 1) Add your username here.