Submissions:2018/Lightning Talk on Homoglyphs
Lightning Talk on Homoglyphs
- Theme (optional)
Tech & Tools
- E-mail address
- Wikimedia username
- Affiliation(s) (optional)
The string “ᎳⅰkіСоnfеrеnce Νοrth ᗅmeriⅽa” is made up of a mixture of Latin, Cyrillic, Greek, Cherokee, Canadian Syllabics, and Roman numeral characters. Characters that look like other characters in a different script are called “homoglyphs”, and in the right font, Latin, Cyrillic, and Greek letters are almost indistinguishable, for example: A/Α/А, M/Μ/М, O/Ο/О, or T/Τ/Т.
Homoglyphs—introduced accidentally or through vandalism—make finding the words that contain them all but impossible.
In this very brief talk, I’ll tell how I discovered Wikipedia’s homoglyph problem, became obsessed with finding and fixing homoglyphs, demo a semi-automated tool for doing so, and discuss plans for a future project to try to solve the problem once and for all!
- Length of presentation
- Special requests
Need to be able to connect my computer to a projector
- Preferred room size
Wherever the other lightning talks are!
- Have you presented on this topic previously? If yes, where/when?
Yes: Lightning talk at the Haystack Relevance Conference (April 2018)
- If you will be incorporating a slidedeck during your presentation, do you agree to upload it to Commons before your session, with a CC-BY-SA 4.0 license, including suitable attribution in the slidedeck for any images used?
I’ll have a brief demo but no slide deck.
- Will you attend WikiConference North America if your submission is not accepted?
If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).
- Add your username here.