Difference between revisions of "Talk:2019/Grants/Iffy.news"
(Answers to Sj's questions) |
m (→Reply to Sj) |
||
(3 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
''Q: Evaluating sources based on their article?''<br> |
''Q: Evaluating sources based on their article?''<br> |
||
− | ''A:'' The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data |
+ | ''A:'' The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: [https://docs.google.com/spreadsheets/d/1ck1_FZC-97uDLIlvRJDTrGqBk0FuDe9yHkluROgpGS8/edit#gid=707857677?usp=sharing Iffy 2020-04]. |
''Q: How much of the composite credibility score is built ''<br> |
''Q: How much of the composite credibility score is built ''<br> |
||
''A:'' None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: [https://github.com/hearvox/unreliable-news/blob/master/topics/credscore.md CredScore]. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.) |
''A:'' None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: [https://github.com/hearvox/unreliable-news/blob/master/topics/credscore.md CredScore]. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.) |
||
− | ''Q: Experimented w/ data analysis based on the WP API?'' |
+ | ''Q: Experimented w/ data analysis based on the WP API?''<br> |
− | ''A:'' I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull |
+ | ''A:'' I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at:<br> |
− | https://en.wikipedia.org/wiki/Enid_News_%26_Eagle |
+ | https://en.wikipedia.org/wiki/Enid_News_%26_Eagle<br> |
https://en.wikipedia.org/wiki/NewsPunch |
https://en.wikipedia.org/wiki/NewsPunch |
||
For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: [https://github.com/hearvox/unreliable-news/blob/master/ref/apis-for-fact-checking.md APIs and Tools for fact-checking]). I'll adapt these scripts to the Wikipedia API. |
For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: [https://github.com/hearvox/unreliable-news/blob/master/ref/apis-for-fact-checking.md APIs and Tools for fact-checking]). I'll adapt these scripts to the Wikipedia API. |
||
− | ''Q: How would you make the composite credibility assessments available?'' |
+ | ''Q: How would you make the composite credibility assessments available?''<br> |
− | ''A:'' Step one would be presenting the data at the site and sharing the raw data via the site, which |
+ | ''A:'' Step one would be presenting the data at the site and sharing the raw data via the site, which runs on WordPress CMS, so can use its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer that format. (I've configured sites and sheets to share similar data in other projects.) |
If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist. |
If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist. |
Latest revision as of 18:52, 24 April 2020
Hello Barrett, thanks for this proposal. Could you say a bit more about what you have in mind?
Some specific questions:
- How are you thinking of evaluating sources based on their article? What aspects of articles would you look at, in which languages?
- How much of the composite credibility score is built now? Is there a demo?
- Have you experimented w/ doing data analysis based on the WP API before?
- How would you make the composite credibility assessments available -- do you have a schema / API in mind for that work?
Warmly, Sj (talk) 23:39, 22 April 2020 (UTC)
Reply to Sj
Q: Language?
A: To start, Iffy.news will be US only. The language of suspect sites is English. The control group of US daily newspapers are mostly English, with a few in Spanish, Arabic, and several Asian languages. My index of those dailies is at NewsNetrics
Q: Evaluating sources based on their article?
A: The evals aren't of articles but of the credibility of the news publisher: the domain name, based on evals by trained reviewers (Media Bias/Fact Check, NewsGuard, etc.). My raw data for those unreliable sites is now in a spreadsheet: Iffy 2020-04.
Q: How much of the composite credibility score is built
A: None. Need first to determine the most accurate signals for distinguishing fake-news from fact-based news sites, that can be API-pulled (i.e., year online, Wikipedia article info). The hypothesis is outlined in the repo: CredScore. (Once we know the most accurate signals, we may throw some AI at it to get the best weight for each signal.)
Q: Experimented w/ data analysis based on the WP API?
A: I've experimented with the Wikipedia API to confirm that using the news-site name (encoded) as search criteria often returns its article (if one exists), from which I can pull its infobox, from which I can often machine-distinguish fake from fact publications. For instance, see the infoboxes at:
https://en.wikipedia.org/wiki/Enid_News_%26_Eagle
https://en.wikipedia.org/wiki/NewsPunch
For related projects I have scripts that regularly and programmatically pull, analyze, and present data from APIs, from sources like Alexa Web Information Service, BuiltWith, Google PageSpeed, Internet Archive, and WebpageTest.org (see my overview: APIs and Tools for fact-checking). I'll adapt these scripts to the Wikipedia API.
Q: How would you make the composite credibility assessments available?
A: Step one would be presenting the data at the site and sharing the raw data via the site, which runs on WordPress CMS, so can use its built-n REST API to output JSON. The data will also auto-import in a public Google spreadsheet, for folk who prefer that format. (I've configured sites and sheets to share similar data in other projects.)
If CredScore proves effective at auto-detecting fake news, with a high level of certainty. The next step might be browser extensions and advertising blacklists. Adaptable scripts for both already exist.