2019/Grants/Source Scour
Title:
Source Scour
Name:
Michael Morisy
Wikimedia username:
morisy
E-mail address:
michaelmuckrock.com
Resume:
https://github.com/MuckRock/ https://github.com/mattkiefer https://www.linkedin.com/in/morisy/ https://docs.google.com/document/d/1F1mPZ8sZ6bJ5O7u5wMr-DGtchVIGm7pKrVmDI5jM0ek/edit?usp=sharing
Geographical impact:
North America
Type of project:
Technology
What is your idea?
The speed of the coronavirus impact has left officials struggling to catch up, with a patchwork of different policies across different countries, states, counties and even cities in North America, with those edicts changing day-by-day.
Rapidly getting evidence-based surveys of where those policies stand — and how they’re changing — is critical to informing the public as well as understanding the efficacy of various approaches. Equally important is being able to quickly gather and summarize other information from around the country, such as the status of open ER beds or the accessibility of state unemployment websites. Building off existing crowdsourcing software created by MuckRock/DocumentCloud and used by thousands of newsrooms around the world, we will modify our tooling to help organize those eager to help breaking down this work into individual tasks that are then synthesized in a way that scales up to the challenge while retaining a high-degree of reliability and proper sourcing.
Using Wikidata’s entries on municipalities and government agencies, volunteers will be presented with simple instructions, such as pasting a link to policy documents from that agency’s website or social media and indicating which category those policies fall under. The overarching projects will be defined by journalists within the MuckRock network of 3,000 newsrooms and universities, but the collected data will be shared publicly with everyone, and can also inform data sets to be added or updated within the Wikidata corpus.
For example, based on newsroom reporting, there might be strong interest in which cities have shelter-in-place mandates vs. social distancing guidelines. Rallying our existing audience, we can have people go through each county’s websites and submit the URL to their policies. A second round of volunteer efforts will classify each of those policies according to given criteria, with a final round of efforts verifying the submitted information.
Newsrooms and public interest groups have told us that being able to quickly get answers to these kinds of questions is critical to properly covering this fast moving story as well as helping set better policy at both the local and national levels. The speed of the coronavirus impact has left officials struggling to catch up, with a patchwork of different policies across different countries, states, counties and even cities in North America, with those edicts changing day-by-day.
Rapidly getting evidence-based surveys of where those policies stand — and how they’re changing — is critical to informing the public as well as understanding the efficacy of various approaches. Equally important is being able to quickly gather and summarize other information from around the country, such as the status of open ER beds or the accessibility of state unemployment websites. Building off existing crowdsourcing software created by MuckRock/DocumentCloud and used by thousands of newsrooms around the world, we will modify our tooling to help organize those eager to help breaking down this work into individual tasks that are then synthesized in a way that scales up to the challenge while retaining a high-degree of reliability and proper sourcing.
Using Wikidata’s entries on municipalities and government agencies, volunteers will be presented with simple instructions, such as pasting a link to policy documents from that agency’s website or social media and indicating which category those policies fall under. The overarching projects will be defined by journalists within the MuckRock network of 3,000 newsrooms and universities, but the collected data will be shared publicly with everyone, and can also inform data sets to be added or updated within the Wikidata corpus.
For example, based on newsroom reporting, there might be strong interest in which cities have shelter-in-place mandates vs. social distancing guidelines. Rallying our existing audience, we can have people go through each county’s websites and submit the URL to their policies. A second round of volunteer efforts will classify each of those policies according to given criteria, with a final round of efforts verifying the submitted information.
Newsrooms and public interest groups have told us that being able to quickly get answers to these kinds of questions is critical to properly covering this fast moving story as well as helping set better policy at both the local and national levels.
Why is it important?
The information gaps around COVID-19 are massive, particularly when it comes to second order impacts ranging from the accessibility of unemployment resources and food security to how schools are handling mass closures. At the same time, the flood of news about COVID-19 is deeply numbing, as people feel they are left with few resources to contribute while they're stuck isolated.
This proposal hopes to push back on both challenges in conjunction — give people a meaningful, participatory way to help while tackling much needed gaps in the information landscape.
Is your project already in progress?
MuckRock’s Assignment tool is up and working, but it does not yet allow variables within text questions. Instead, it allows you to upload PDFs and have questions presented on a page-by-page basis for the PDF (example: https://www.muckrock.com/assignment/help-explore-donald-rumsfelds-snowflakes-30/form/). We have been talking with other newsrooms that are really interested in access to this kind of infrastructure and data, and believe that with more newsrooms interested in collaborating and quickly building on-the-fly data sets, this is the perfect opportunity to work on this project.
We’ve also been working on larger COVID-19-related information needs with community surveys around the country so have an active network interested as well as clear areas of prioritization.
How is it relevant to credibility and Wikipedia? (max 500 words)
Access to clearly sourced, vetted information is particularly challenging as data and documents are increasingly snipped of the context and recirculated. It’s also easy for newsrooms to just see individual pieces of data, such as shelter in place orders or one city’s food security response, without having a stronger understanding of the overall landscape. This leads to decreasing public trust, which can be fatal in times like these.
By integrating our Assignments tool with Wikidata for using jurisdiction and agency information, we can quickly scale up newsgathering resources to tackle the immensity of the challenge while providing a clear and understandable links to how and where that information was sourced. Perhaps just as importantly, by letting the public participate in the gathering and checking of that information, you help build their investment in and trust of the end product, help push back against the cycle of increasing distrust.
What is the ultimate impact of this project?
In the short term, we believe that this fills an urgent need to gather and assess on a state-by-state, county-by-county basis health edicts, public messaging, and other crucial details in a way that helps showcase effective interventions and policies while helping the public better understand the full impact of COVID-19 across North America — and how elected officials are responding to it.
We also think this is a potentially powerful way to let a broad swath of people, often stuck at home and feeling powerless, feel more agency when confronting a terrible crisis.
Long term, we believe that more collaborative, crowdsourced approaches to gathering and vetting source materials can help fill a serious hole in local journalism around the world, bridging the public’s curiosity and desire to be informed with increasingly challenging information landscape.
Could it scale?
We believe that this project could quickly scale up to be used by thousands of newsrooms and tens of thousands of participants looking to connect and help build the public’s understanding of the spread of COVID-19 . The platforms this project builds on — MuckRock and DocumentCloud — are used by a network of 3,000 newsrooms and currently reach 40 million monthly readers with direct access to primary source information, including public records requests, documents posted by newsrooms, and more.
While our immediate concern is addressing information gaps and needs related to COVID-19, we believe that there’s near infinite similar information gathering and analysis challenges that this project would be broadened to help challenge.
Why are you the people to do it?
The MuckRock Foundation mixes a strong, diverse community of transparency enthusiasts with a network of 3,000 newsrooms and a technology team that runs and maintains six open source tools relied upon by tens of thousands of users every month. We’ve been building up our crowdsourcing toolkit for several years and deeply understand the challenges and opportunities in this approach, and this project is a high-impact extension of that work.
Additionally, Matt Kiefer, a John S. Knight Journalism Fellow at Stanford, will be consulting and assisting with the project. His focus before and during the fellowship was on journalism automation in the pursuit of monitoring civil and human rights issues, and his experience and expertise both there and building the FOIAmail software suite.
What is the impact of your idea on diversity and inclusiveness of the Wikimedia movement?
Access to reliable information matters to everyone, but oftentimes people are intimidated by contributing to Wikipedia and Wikidata. Our goal is that by coming with clear asks tied to a pressing public need, we can make it easy for people to start contributing to a public commons in new, innovative, and inviting ways, which for some will then be an entry point to further involvement.
What are the challenges associated with this project and how you will overcome them?
Crowdsourcing data requires both a willing set of volunteers as well as the capacity to check and make use of the collected data, essentially a two-sided market problem. We are well suited to help address this challenge since the MuckRock network has 3,000 member newsrooms and our portfolio of sites reaches 40 million people each month — we can promote these opportunities to that audience as the project scales up.
Other challenges include integrating Wikidata into our systems and ensuring data conformity, but we have a broader team of technical experts who can help troubleshoot tricky problems that arise.
How much money are you requesting?
$7500 USD
How will you spend the money?
$1,000 will go towards server costs. $2,500 will go towards backend development of the system. $2,000 will go towards front end design. $2,000 will go towards analysis of submissions, including original reporting, error checking, and preparation for public release
How long will your project take?
The timeline for this project is three months
Have you worked on projects for previous grants before?
Yes. We received a grant from the News Integrity Initiative to rebuild the DocumentCloud document analysis and publication platform: https://www.muckrock.com/news/archives/2018/oct/10/news-integrity-initiative-grant-muckrock-doccloud/ This has been relaunched at beta.documentcloud.org
The Assignments tool, which this project grows out of, was originally funded by a grant from the Knight Foundation:
That tool is part of the larger MuckRock code base, which is open sourced here: https://github.com/muckrock/muckrock