Talk:2019/Grants/Iffy.news

From WikiConference North America
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind?

Some specific questions:

  • How are you thinking of evaluating sources based on their article? What aspects of articles would you look at, in which languages?
  • How much of the composite credibility score is built now? Is there a demo?
  • Have you experimented w/ doing data analysis based on the WP API before?
  • How would you make the composite credibility assessments available -- do you have a schema / API in mind for that work?

Warmly, Sj (talk) 23:39, 22 April 2020 (UTC)

Reply to Sj

Q: Language?
A: To start, Iffy.news will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at NewsNetrics

Q: Evaluating sources based on their article?
A: The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: Iffy 2020-04.

Q: How much of the composite credibility score is built
A: None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: CredScore. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.)

Q: Experimented w/ data analysis based on the WP API?
A: I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at:
https://en.wikipedia.org/wiki/Enid_News_%26_Eagle
https://en.wikipedia.org/wiki/NewsPunch

For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: APIs and Tools for fact-checking). I'll adapt these scripts to the Wikipedia API.

Q: How would you make the composite credibility assessments available?
A: Step one would be presenting the data at the site and sharing the raw data via the site, which runs on WordPress CMS, so can use its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer that format. (I've configured sites and sheets to share similar data in other projects.)

If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist.