Submissions:2014/Fit for Public Display: Rethinking Censorship via a Comparison of Chinese Wikipedia with Hudong and Baidu Baike


 * Title of the submission: Fit for Public Display: Rethinking Censorship via a Comparison of Chinese Wikipedia with Hudong and Baidu Baike


 * Themes (Proposal Themes - Community, Tech, Outreach, GLAM, Education): Outreach


 * Type of submission (Presentation Types - Panel, Workshop, Presentation, etc): Presentation


 * Author of the submission: Jason Q. Ng


 * E-mail address: jason54mail *at* gmail *dot* com


 * Username: jasonqng


 * US state or country of origin: NY


 * Affiliation, if any (organization, company etc.): The Citizen Lab, University of Toronto


 * Personal homepage or blog: | Blocked on Weibo


 * Abstract (at least 300 words to describe your proposal):

In 2008, Baidu’s chief scientist William Chang said, “There’s, in fact, no reason for China to use Wikipedia. . . It’s very natural for China to make its own products.” Today Hudong and Baidu Baike greatly eclipse the Chinese-language version of Wikipedia despite (or because of) the censorship known to take place on the sites. However, identifying outright instances or patterns in censorship can be difficult due to the (mostly) user-generated nature and oversight of the content.

Instead, I’ve developed a project which attempts to perform a large-scale comparison of the three services, matching thousands of Chinese-language Wikipedia articles with their in-China counterparts, in order to identify the “content gaps” in the two baike (Chinese for “encyclopedia,” which refer to Hudong’s and Baidu’s online encyclopedias). Censorship—or at the very least anomalies in the generation of content—might be identified by articles that don’t exist, “protected” articles that are not editable by regular users, and by articles that are much shorter than those on Wikipedia China. The reason “might” is emphasized is due to the distributed oversight nature of these online encyclopedias, where not only governments but also companies and users get to play the role of content gatekeeper. This decentralization makes attributing who is responsible for apparent censorship more difficult, a topic which I hope to explore more deeply by examining how it functions in these online encyclopedias.

By examining the topics and articles that are left visible in these baike and considering the motivations behind those who seek out, view, edit, and approve of these articles, this project hopes to offer a more nuanced view of the typical narratives about censorship in China. Trying to understand what sorts of expressions netizens are making via these online encyclopedias, despite whatever censorship might be taking place, is as interesting as the potential censorship itself.

This presentation showcases some of the preliminary results, and in the coming year, I hope to expand the project by refining the matching algorithms and to perform in-depth case studies of specific articles which exemplify the different ways that Internet users on these three services write about similar topics. This project will hopefully push us to once again consider the many complexities when discussing information control in environments where oversight of content has been decentralized to companies and users—an environment which makes it increasingly harder to identify traditional instances of censorship.


 * Length of presentation/talk (see Presentation Types for lengths of different presentation types): 15 Minutes, perhaps part of panel on censorship or global usage of Wikipedia


 * Will you attend WikiConference USA if your submission is not accepted?: Yes


 * Slides or further information (optional): | draft of research report


 * Special request as to time of presentations:

Interested attendees
'''If you are interested in attending this session, please sign with your username below. This will help reviewers to decide which sessions are of high interest. Sign with four tildes. ( ~ ).'''


 * 1) Geraldshields11 (talk) 09:06, 27 May 2014 (EDT)
 * 2) Add your username here.