Difference between revisions of "Talk:2019/Grants/Iffy.news"
(Created page with "Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind? Some specific questions: * How are you thinking of evaluating sources base...") |
(Answers to Sj's questions) |
||
Line 8: | Line 8: | ||
Warmly, [[User:Sj|Sj]] ([[User talk:Sj|talk]]) 23:39, 22 April 2020 (UTC) |
Warmly, [[User:Sj|Sj]] ([[User talk:Sj|talk]]) 23:39, 22 April 2020 (UTC) |
||
+ | |||
+ | === Reply to [[User:Sj|Sj]] === |
||
+ | ''Q: Language?''<br> |
||
+ | ''A:'' To start, Iffy.news will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at [https://news.pubmedia.us/ NewsNetrics] |
||
+ | |||
+ | ''Q: Evaluating sources based on their article?''<br> |
||
+ | ''A:'' The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data or those unreliable sites is now in a spreadsheet: [https://docs.google.com/spreadsheets/d/1ck1_FZC-97uDLIlvRJDTrGqBk0FuDe9yHkluROgpGS8/edit#gid=707857677?usp=sharing Iffy 2020-04]. |
||
+ | |||
+ | ''Q: How much of the composite credibility score is built ''<br> |
||
+ | ''A:'' None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: [https://github.com/hearvox/unreliable-news/blob/master/topics/credscore.md CredScore]. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.) |
||
+ | |||
+ | ''Q: Experimented w/ data analysis based on the WP API?'' |
||
+ | ''A:'' I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull in its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at: |
||
+ | https://en.wikipedia.org/wiki/Enid_News_%26_Eagle |
||
+ | https://en.wikipedia.org/wiki/NewsPunch |
||
+ | |||
+ | For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: [https://github.com/hearvox/unreliable-news/blob/master/ref/apis-for-fact-checking.md APIs and Tools for fact-checking]). I'll adapt these scripts to the Wikipedia API. |
||
+ | |||
+ | ''Q: How would you make the composite credibility assessments available?'' |
||
+ | ''A:'' Step one would be presenting the data at the site and sharing the raw data via the site, which use the WordPress CMS, so can us its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer data in that format. (I've configured sites and sheets to share similar data in other projects.) |
||
+ | |||
+ | If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist. |
Revision as of 18:41, 24 April 2020
Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind?
Some specific questions:
- How are you thinking of evaluating sources based on their article? What aspects of articles would you look at, in which languages?
- How much of the composite credibility score is built now? Is there a demo?
- Have you experimented w/ doing data analysis based on the WP API before?
- How would you make the composite credibility assessments available -- do you have a schema / API in mind for that work?
Warmly, Sj (talk) 23:39, 22 April 2020 (UTC)
Reply to Sj
Q: Language?
A: To start, Iffy.news will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at NewsNetrics
Q: Evaluating sources based on their article?
A: The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data or those unreliable sites is now in a spreadsheet: Iffy 2020-04.
Q: How much of the composite credibility score is built
A: None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: CredScore. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.)
Q: Experimented w/ data analysis based on the WP API? A: I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull in its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at: https://en.wikipedia.org/wiki/Enid_News_%26_Eagle https://en.wikipedia.org/wiki/NewsPunch
For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: APIs and Tools for fact-checking). I'll adapt these scripts to the Wikipedia API.
Q: How would you make the composite credibility assessments available? A: Step one would be presenting the data at the site and sharing the raw data via the site, which use the WordPress CMS, so can us its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer data in that format. (I've configured sites and sheets to share similar data in other projects.)
If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist.