Difference between revisions of "Talk:2019/Grants/Iffy.news"

From WikiConference North America
Jump to navigation Jump to search
(Answers to Sj's questions)
Line 14: Line 14:
   
 
''Q: Evaluating sources based on their article?''<br>
 
''Q: Evaluating sources based on their article?''<br>
''A:'' The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data or those unreliable sites is now in a spreadsheet: [https://docs.google.com/spreadsheets/d/1ck1_FZC-97uDLIlvRJDTrGqBk0FuDe9yHkluROgpGS8/edit#gid=707857677?usp=sharing Iffy 2020-04].
+
''A:'' The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: [https://docs.google.com/spreadsheets/d/1ck1_FZC-97uDLIlvRJDTrGqBk0FuDe9yHkluROgpGS8/edit#gid=707857677?usp=sharing Iffy 2020-04].
   
 
''Q: How much of the composite credibility score is built ''<br>
 
''Q: How much of the composite credibility score is built ''<br>

Revision as of 18:46, 24 April 2020

Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind?

Some specific questions:

  • How are you thinking of evaluating sources based on their article? What aspects of articles would you look at, in which languages?
  • How much of the composite credibility score is built now? Is there a demo?
  • Have you experimented w/ doing data analysis based on the WP API before?
  • How would you make the composite credibility assessments available -- do you have a schema / API in mind for that work?

Warmly, Sj (talk) 23:39, 22 April 2020 (UTC)

Reply to Sj

Q: Language?
A: To start, Iffy.news will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at NewsNetrics

Q: Evaluating sources based on their article?
A: The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: Iffy 2020-04.

Q: How much of the composite credibility score is built
A: None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: CredScore. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.)

Q: Experimented w/ data analysis based on the WP API? A: I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull in its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at: https://en.wikipedia.org/wiki/Enid_News_%26_Eagle https://en.wikipedia.org/wiki/NewsPunch

For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: APIs and Tools for fact-checking). I'll adapt these scripts to the Wikipedia API.

Q: How would you make the composite credibility assessments available? A: Step one would be presenting the data at the site and sharing the raw data via the site, which use the WordPress CMS, so can us its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer data in that format. (I've configured sites and sheets to share similar data in other projects.)

If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist.