Difference between revisions of "2019/Grants/Explicit credibility signal data on wikipedia"

From WikiConference North America
< 2019‎ | Grants
Jump to navigation Jump to search
m
Line 4: Line 4:
 
|email=sandro{{@}}w3.org
 
|email=sandro{{@}}w3.org
 
|resume=* Larger project website: https://credweb.org
 
|resume=* Larger project website: https://credweb.org
* Our analysis: https://www.w3.org/2018/10/credibility-tech/
 
 
* My resume: https://hawke.org/resume-2020/
 
* My resume: https://hawke.org/resume-2020/
 
|geography=global
 
|geography=global
 
|type=Technology
 
|type=Technology
  +
|idea=See https://docs.google.com/document/d/1kdwuzWqnh3-As3Uyiyoo2Uk7AnzmXmVTZARBOwpY4gY/edit
|idea=Let's connect wikipedians to the emerging ecosystem of credibility data. Let's draw on their expertise and diligence to create community-sourced credibility data, letting individual wikipedians express credibility signals and interact with other credibility data.
 
 
We can do this by giving people the tools to see how credibility data relates to their work on Wikipedia, giving them more insight into what sources are reliable. As people learn to navigate the credibility landscape, they can increasingly help others making their own decisions about source reliability.
 
 
Once this seed community has tested and refined the credibility signals process, this approach has the potential to rapidly grow to global scale.
 
 
The technologies for this idea have been developed in the W3C Credible Web Community Group and grow in part out of earlier work at the Credibility Coalition (then called the Credibility Indicators Working Group). This approach is based on the insight that determining whether to trust someone is best done with lots of data, and that data itself needs to come from sources we trust. Thus, we imagine each person having their own credibility network, feeding them data they use to decide what to believe. This is a model for how humans behave every day, and it is also a blueprint for how software can help combat misinformation. The idea is to use computers to push back against the flood of misinformation other computer systems have helped bring us.
 
 
For this grant, we propose to use wikipedia user pages as the storage medium for credibility statements, such as those seen in [https://credweb.org/reviewed-signals/ Reviewed Credibility Signals]. These statements have the form and semantics of natural language, and should not be too out of place on a wikipedia user page or sub-page, but also have machine-readable semantics, because they adhere to defined templates. As such, they can fully participate in a credibility data ecosystem, as envisioned in this work.
 
 
We propose to adapt our software to work with these user pages, and with other wikipedia data streams, such as "revert" and "thanks" events, which can be used as an initial, bootstrapping proxy for credibility data, to help make the system useful to early adopters.
 
 
Finally, we propose to engage with wikipedia projects with related interests and adapt our software as necessary to become a regular useful tool for them.
 
   
 
|importance=For Wikipedia, this idea promises to help in the fight against misinformation, making it easier for wikipedians and the broader world to collaborate in identifying credible and non-credible sources.
 
|importance=For Wikipedia, this idea promises to help in the fight against misinformation, making it easier for wikipedians and the broader world to collaborate in identifying credible and non-credible sources.

Revision as of 18:48, 4 April 2020


Title:

Explicit credibility signal data on wikipedia

Name:

Sandro Hawke

Wikimedia username:

Sandro_Hawke

E-mail address:

sandro@w3.org

Resume:

Geographical impact:

global

Type of project:

Technology

What is your idea?

See https://docs.google.com/document/d/1kdwuzWqnh3-As3Uyiyoo2Uk7AnzmXmVTZARBOwpY4gY/edit

Why is it important?

For Wikipedia, this idea promises to help in the fight against misinformation, making it easier for wikipedians and the broader world to collaborate in identifying credible and non-credible sources.

For the world at large, the stakes are much higher, as this approach has the potential to turn the tide against misinformation across all technology platforms.

Is your project already in progress?

We are developing the relevant concepts and tools (as seen at https://credweb.org) but have not begun deployment in the wikipedia community or tooling to work with wikipedia data feeds.

How is it relevant to credibility and Wikipedia? (max 500 words)

There are many connections between this Credibility Signals work and Wikipedia:

  • Wikipedia has always needed to be able to separate fact from fiction. While it does this very well, these tools might make the task easier. Specifically, this can rapidly highlight which sources have unacceptably low credibility and help with sorting out why particularly sources are viewed as credible or not credible.
  • Wikipedia has always needed to reduce harm done by careless and malicious users. It does this very well, but again, these tools might make the task easier, assisting in tracking and management of the reputation of users, which can be used in modifying their privileges.
  • Because of its great expertise in these fields, the Wikipedia community is an excellent proving ground for these technologies. Flaws in the technologies that might eventually lead to failure in the broader media ecosystem are likely to be spotted very quickly by wikipedians, giving time to improve the designs before wider deployment.

What is the ultimate impact of this project?

If successful, this project will show a clear way that people can collaborate online in protecting themselves and their communities from misinformation. This method can be adopted by communities and platforms around the world to greatly reduce misinformation and other online harms.

Could it scale?

Yes, this plan is phenomenally scalable.

It is based on existing social practices, where each individual manages their own credibility assessment process (deciding what to believe), using what they can glean from their surroundings, including their social network. This process scales linearly with the number of individuals, with each individual deciding how much of their own resources to devote to each assessment they make. Adding computers and networking to this existing human process should greatly improve the efficiency and accuracy of this process, without altering this scaling behavior.

In its approach to decentralization, this design avoids any central bottleneck. Every individual and organization is free to deploy as much human and computing resources as they choose, without needing approval or support from us or anyone else. This allows the kind of scaling we see in the web and email, which are similarly decentralized, but much faster since the underlying infrastructure is already in place. If the system provides sufficient value to users, as we expect, this approach might grow to global scale in a matter of months.

The pace of scaling may also be quite rapid because it naturally spreads over social connections and social media. While it relies on software, which is often slow to develop, the software can come from any source, reducing this risk. Because of the social connections, the person-to-person spread may resemble the spread of ideas (memes) more than the slower (but still rapid) spread of technology platforms. At this point, in April 2020, we are perhaps all-to-familiar with the power of things which are able to spread person-to-person, out of control.

Why are you the people to do it?

I bring experience and expertise in all the necessary challenge areas, including credibility signals, community development, web application development, decentralized systems, and consensus process.

What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?

This project has no direct connection to diversity or inclusiveness. We do not foresee any specific indirect impact. We are aware that decentralized systems have a mixed track record on these issues, with Mastodon showing some promise, while other efforts use decentralization to route around platform Trust & Safety enforcement actions. Because our decentralization technology builds on top of existing platforms, re-using their social features, rather than building it's own (perhaps using cryptographic techniques) we do not expect it to manifest that difficulty.

What are the challenges associated with this project and how you will overcome them?

This is an ambitious piece of an ambitious project. We are reducing risk by maximizing simplicity and using a progression of small prototypes and experiments.

Challenges include:

  • Getting people to look at credibility data. Approach: make it visually appealing and salient. For example, see credibility network demo at https://credweb.org/viewer/ which has elements that are compelling and fun; it becomes salient when we let people add in the sources they care about and get to see how others judge those sources. We can bootstrap with existing wikipedia data feeds of likes and reverts as an initial proxy for credibility between wikipedians and draw on existing source credibility work for data on external sources.
  • Getting people to author credibility data. Once people are engaged in the data as a consumer, we hypothesize they will be motivated to engage as a producer to "correct" the data, to express what they believe or know. Additionally, a culture of contributing data to help the world, already common among wikipedians, should help. There are a range of ways to simplify or even gamify the contribution step, if necessary.
  • Harmful participants. Since we propose to primarily and initially use credibility data which hosted on wikipedia user pages, to some degree the existing community safety mechanisms will still apply. We would like to demonstrate, however, that such mechanisms can be largely replaced by credibility data itself. In theory, people observed to do harm can be identified and have their actions demoted like non-credible content.
  • Getting people to trust the system. Approach: transparency and feedback. Make it clear which individuals are the source of each bit of data, with clear provenance and change tracking. Have the interface promote a virtuous cycle of improving the data and improving one's own credibility. This is similar to wikipedia's own mechanisms for being trustworthy (to people who know how it works).

How much money are you requesting?

10k USD for the Wikipedia aspects (outlined here) of the Credibility Signals work

How will you spend the money?

To support my time on this work

How long will your project take?

Up to 12 months, in three phases:

  • Phase 1 - up to four months - refine deployment plan, identify partners, settle issues within credweb CG
  • Phase 2 - about 2 months - active development of tools; release
  • Phase 3 - up to six months - revise and improve, based on user experienc

Have you worked on projects for previous grants before?

Yes, my work has been primarily grant funded for many years. Some highlights with web pages maintained by others: