Submissions:2018/Lightning Talk on Homoglyphs

From WikiConference North America
Jump to: navigation, search

This submission will not be reviewed as it has been submitted as an Unconference session. Please see 2018/Unconference for more information.


Lightning Talk on Homoglyphs

Theme (optional)

Tech & Tools

Academic Peer Review option


Type of submission

Unconference—Lightning talk


Trey Jones

E-mail address

Wikimedia username

TJones WMF

Affiliation(s) (optional)

Wikimedia Foundation


The string “ᎳⅰkіСоnfеrеnce Νοrth ᗅmeriⅽa” is made up of a mixture of Latin, Cyrillic, Greek, Cherokee, Canadian Syllabics, and Roman numeral characters. Characters that look like other characters in a different script are called “homoglyphs”, and in the right font, Latin, Cyrillic, and Greek letters are almost indistinguishable, for example: A/Α/А, M/Μ/М, O/Ο/О, or T/Τ/Т.

Homoglyphs—introduced accidentally or through vandalism—make finding the words that contain them all but impossible.

In this very brief talk, I’ll tell how I discovered Wikipedia’s homoglyph problem, became obsessed with finding and fixing homoglyphs, demo a semi-automated tool for doing so, and discuss plans for a future project to try to solve the problem once and for all!

Length of presentation

~5 minutes

Special requests

Need to be able to connect my computer to a projector

Preferred room size

Wherever the other lightning talks are!

Have you presented on this topic previously? If yes, where/when?

Yes: Lightning talk at the Haystack Relevance Conference (April 2018)

If you will be incorporating a slidedeck during your presentation, do you agree to upload it to Commons before your session, with a CC-BY-SA 4.0 license, including suitable attribution in the slidedeck for any images used?

I’ll have a brief demo but no slide deck.

Will you attend WikiConference North America if your submission is not accepted?


Interested attendees

If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. (~~~~).

  1. Add your username here.