From WikiConference North America
Jump to navigation Jump to search

Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind?

Some specific questions:

  • How are you thinking of evaluating sources based on their article? What aspects of articles would you look at, in which languages?
  • How much of the composite credibility score is built now? Is there a demo?
  • Have you experimented w/ doing data analysis based on the WP API before?
  • How would you make the composite credibility assessments available -- do you have a schema / API in mind for that work?

Warmly, Sj (talk) 23:39, 22 April 2020 (UTC)

Reply to Sj

Q: Language?
A: To start, will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at NewsNetrics

Q: Evaluating sources based on their article?
A: The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: Iffy 2020-04.

Q: How much of the composite credibility score is built
A: None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: CredScore. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.)

Q: Experimented w/ data analysis based on the WP API?
A: I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at:

For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and (see my overview: APIs and Tools for fact-checking). I'll adapt these scripts to the Wikipedia API.

Q: How would you make the composite credibility assessments available?
A: Step one would be presenting the data at the site and sharing the raw data via the site, which use the WordPress CMS, so can us its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer data in that format. (I've configured sites and sheets to share similar data in other projects.)

If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist.