OCLC Research Position Paper by Thomas Padilla: “Despite greater awareness, significant gaps persist between concept and operationalization in libraries at the level of workflows (managing bias in probabilistic description), policies (community engagement vis-à-vis the development of machine-actionable collections), positions (developing staff who can utilize, develop, critique, and/or promote services influenced by data science, machine learning, and AI), collections (development of “gold standard” training data), and infrastructure (development of systems that make use of these technologies and methods). Shifting from awareness to operationalization will require holistic organizational commitment to responsible operations. The viability of responsible operations depends on organizational incentives and protections that promote constructive dissent…(More)”.
Rosie the Robot: Social accountability one tweet at a time
Blogpost by Yasodara Cordova and Eduardo Vicente Goncalvese: “Every month in Brazil, the government team in charge of processing reimbursement expenses incurred by congresspeople receives more than 20,000 claims. This is a manually intensive process that is prone to error and susceptible to corruption. Under Brazilian law, this information is available to the public, making it possible to check the accuracy of this data with further scrutiny. But it’s hard to sift through so many transactions. Fortunately, Rosie, a robot built to analyze the expenses of the country’s congress members, is helping out.
Rosie was born from Operação Serenata de Amor, a flagship project we helped create with other civic hackers. We suspected that data provided by members of Congress, especially regarding work-related reimbursements, might not always be accurate. There were clear, straightforward reimbursement regulations, but we wondered how easily individuals could maneuver around them.
Furthermore, we believed that transparency portals and the public data weren’t realizing their full potential for accountability. Citizens struggled to understand public sector jargon and make sense of the extensive volume of data. We thought data science could help make better sense of the open data provided by the Brazilian government.
Using agile methods, specifically Domain Driven Design, a flexible and adaptive process framework for solving complex problems, our group started studying the regulations, and converting them into software code. We did this by reverse-engineering the legal documents–understanding the reimbursement rules and brainstorming ways to circumvent them. Next, we thought about the traces this circumvention would leave in the databases and developed a way to identify these traces using the existing data. The public expenses database included the images of the receipts used to claim reimbursements and we could see evidence of expenses, such as alcohol, which weren’t allowed to be paid with public money. We named our creation, Rosie.
This method of researching the regulations to then translate them into software in an agile way is called Domain-Driven Design. Used for complex systems, this useful approach analyzes the data and the sector as an ecosystem, and then uses observations and rapid prototyping to generate and test an evolving model. This is how Rosie works. Rosie sifts through the reported data and flags specific expenses made by representatives as “suspicious.” An example could be purchases that indicate the Congress member was in two locations on the same day and time.
After finding a suspicious transaction, Rosie then automatically tweets the results to both citizens and congress members. She invites citizens to corroborate or dismiss the suspicions, while also inviting congress members to justify themselves.
Rosie isn’t working alone. Beyond translating the law into computer code, the group also created new interfaces to help citizens check up on Rosie’s suspicions. The same information that was spread in different places in official government websites was put together in a more intuitive, indexed and machine-readable platform. This platform is called Jarbas – its name was inspired by the AI system that controls Tony Stark’s mansion in Iron Man, J.A.R.V.I.S. (which has origins in the human “Jarbas”) – and it is a website and API (application programming interface) that helps citizens more easily navigate and browse data from different sources. Together, Rosie and Jarbas helps citizens use and interpret the data to decide whether there was a misuse of public funds. So far, Rosie has tweeted 967 times. She is particularly good at detecting overpriced meals. According to an open research, made by the group, since her introduction, members of Congress have reduced spending on meals by about ten percent….(More)”.
The Challenges of Sharing Data in an Era of Politicized Science
Editorial by Howard Bauchner in JAMA: “The goal of making science more transparent—sharing data, posting results on trial registries, use of preprint servers, and open access publishing—may enhance scientific discovery and improve individual and population health, but it also comes with substantial challenges in an era of politicized science, enhanced skepticism, and the ubiquitous world of social media. The recent announcement by the Trump administration of plans to proceed with an updated version of the proposed rule “Strengthening Transparency in Regulatory Science,” stipulating that all underlying data from studies that underpin public health regulations from the US Environmental Protection Agency (EPA) must be made publicly available so that those data can be independently validated, epitomizes some of these challenges. According to EPA Administrator Andrew Wheeler: “Good science is science that can be replicated and independently validated, science that can hold up to scrutiny. That is why we’re moving forward to ensure that the science supporting agency decisions is transparent and available for evaluation by the public and stakeholders.”
Virtually every time JAMA publishes an article on the effects of pollution or climate change on health, the journal immediately receives demands from critics to retract the article for various reasons. Some individuals and groups simply do not believe that pollution or climate change affects human health. Research on climate change, and the effects of climate change on the health of the planet and human beings, if made available to anyone for reanalysis could be manipulated to find a different outcome than initially reported. In an age of skepticism about many issues, including science, with the ability to use social media to disseminate unfounded and at times potentially harmful ideas, it is challenging to balance the potential benefits of sharing data with the harms that could be done by reanalysis.
Can the experience of sharing data derived from randomized clinical trials (RCTs)—either as mandated by some funders and journals or as supported by individual investigators—serve as examples as a way to safeguard “truth” in science….
Although the sharing of data may have numerous benefits, it also comes with substantial challenges particularly in highly contentious and politicized areas, such as the effects of climate change and pollution on health, in which the public dialogue appears to be based on as much fiction as fact. The sharing of data, whether mandated by funders, including foundations and government, or volunteered by scientists who believe in the principle of data transparency, is a complicated issue in the evolving world of science, analysis, skepticism, and communication. Above all, the scientific process—including original research and reanalysis of shared data—must prevail, and the inherent search for evidence, facts, and truth must not be compromised by special interests, coercive influences, or politicized perspectives. There are no simple answers, just words of caution and concern….(More)”.
The Impact of Open Data on Public Procurement
Paper by Raphael Duguay, Thomas Rauter and Delphine Samuels: “We examine how the increased accessibility of public purchasing data affects competition, prices, contract allocations, and contract performance in government procurement. The European Union recently made its already public but difficult-to-access information about the process and outcomes of procurement awards available for bulk download in a user-friendly format.
Comparing government contracts above EU publication thresholds with contracts that are not, we find that increasing the public accessibility of procurement data raises the likelihood of having competitive bidding processes, increases the number of bids per contract, and facilitates market entry by new vendors. Following the open data initiative, procurement prices decrease and EU government agencies are more likely to award contracts to the lowest bidder.
However, the increased competition comes at a cost ─ firms execute government contracts with more delays and ex-post price renegotiations. These effects are stronger for new vendors, complex procurement projects, and contracts awarded solely based on price. Overall, our results suggest that open procurement data facilitates competition and lowers ex-ante procurement prices but does not necessarily increase allocative efficiency in government contracting….(More)”.
The Trace
About: “The Trace is an independent, nonpartisan, nonprofit newsroom dedicated to shining a light on America’s gun violence crisis….
Every year in our country, a firearm is used in nearly 500,000 crimes, resulting in the deaths and injuries of more than 110,000 people. Shootings devastate families and communities and drain billions of dollars from local, state, and federal governments. Meanwhile, the problem of gun violence has been compounded by another: the shortage of knowledge about the issue…
Data and records are shielded from public view—or don’t exist. Gun-lobby backed restrictions on federal gun violence research deprive policymakers and public health experts of potentially life-saving facts. Other laws limit the information that law enforcement agencies can share on illegal guns and curb litigation that could allow scrutiny of industry practices….
We make the problem clear. In partnership with Slate, we built an eye-opening, interactive map plotting the locations of nearly 40,000 incidents of gun violence nationwide. The feature received millions of pageviews and generated extensive local coverage and social media conversation. “So many shootings and deaths, so close to my home,” wrote one reader. “And I hadn’t even heard about most of them.”…(More)”.
Benefits of Open Data in Public Health
Paper by P. Huston, VL. Edge and E. Bernier: “Open Data is part of a broad global movement that is not only advancing science and scientific communication but also transforming modern society and how decisions are made. What began with a call for Open Science and the rise of online journals has extended to Open Data, based on the premise that if reports on data are open, then the generated or supporting data should be open as well. There have been a number of advances in Open Data over the last decade, spearheaded largely by governments. A real benefit of Open Data is not simply that single databases can be used more widely; it is that these data can also be leveraged, shared and combined with other data. Open Data facilitates scientific collaboration, enriches research and advances analytical capacity to inform decisions. In the human and environmental health realms, for example, the ability to access and combine diverse data can advance early signal detection, improve analysis and evaluation, inform program and policy development, increase capacity for public participation, enable transparency and improve accountability. However, challenges remain. Enormous resources are needed to make the technological shift to open and interoperable databases accessible with common protocols and terminology. Amongst data generators and users, this shift also involves a cultural change: from regarding databases as restricted intellectual property, to considering data as a common good. There is a need to address legal and ethical considerations in making this shift. Finally, along with efforts to modify infrastructure and address the cultural, legal and ethical issues, it is important to share the information equitably and effectively. While there is great potential of the open, timely, equitable and straightforward sharing of data, fully realizing the myriad of benefits of Open Data will depend on how effectively these challenges are addressed….(More)”.
The Public-Data Opportunity: Why Governments Should Share More
Press Release: “The Lisbon Council launches The Public-Data Opportunity: Why Governments Should Share More, a new discussion paper that looks at the state of play for public-sector data sharing – and calls for better protocols and procedures to deliver data-driven service to all Europeans. The paper analyses the importance of data-sharing between European Union public agencies, identifies the barriers and proposes seven policy recommendations that will help lift them. It builds on the research conducted by the “Understanding Value Co-Creation in Public Services for Transforming European Public Administrations” project, or Co-VAL, a 12-partner research consortium, co-funded by the European Union. And was launched at The 2019 Digital Government Conference convened by the Presidency of the European Council of Finland in Helsinki….(More)”
Restricting data’s use: A spectrum of concerns in need of flexible approaches
Dharma Akmon and Susan Jekielek at IASSIST Quaterly: “As researchers consider making their data available to others, they are concerned with the responsible use of data. As a result, they often seek to place restrictions on secondary use. The Research Connections archive at ICPSR makes available the datasets of dozens of studies related to childcare and early education. Of the 103 studies archived to date, 20 have some restrictions on access. While ICPSR’s data access systems were designed primarily to accommodate public use data (i.e. data without disclosure concerns) and potentially disclosive data, our interactions with depositors reveal a more nuanced notion range of needs for restricting use. Some data present a relatively low risk of threatening participants’ confidentiality, yet the data producers still want to monitor who is accessing the data and how they plan to use them. Other studies contain data with such a high risk of disclosure that their use must be restricted to a virtual data enclave. Still other studies rest on agreements with participants that require continuing oversight of secondary use by data producers, funders, and participants. This paper describes data producers’ range of needs to restrict data access and discusses how systems can better accommodate these needs….(More)”.
Open Cities | Open Data: Collaborative Cities in the Information Era
Book edited by Scott Hawken, Hoon Han and Chris Pettit: “Today the world’s largest economies and corporations trade in data and its products to generate value in new disruptive markets. Within these markets vast streams of data are often inaccessible or untapped and controlled by powerful monopolies. Counter to this exclusive use of data is a promising world-wide “open-data” movement, promoting freely accessible information to share, reuse and redistribute. The provision and application of open data has enormous potential to transform exclusive, technocratic “smart cities” into inclusive and responsive “open-cities”.
This book argues that those who contribute urban data should benefit from its production. Like the city itself, the information landscape is a public asset produced through collective effort, attention, and resources. People produce data through their engagement with the city, creating digital footprints through social medial, mobility applications, and city sensors. By opening up data there is potential to generate greater value by supporting unforeseen collaborations, spontaneous urban innovations and solutions, and improved decision-making insights. Yet achieving more open cities is made challenging by conflicting desires for urban anonymity, sociability, privacy and transparency. This book engages with these issues through a variety of critical perspectives, and presents strategies, tools and case studies that enable this transformation….(More)”.
Community Data Dialogues
Sunlight foundation: “Community Data Dialogues are in-person events designed to share open data with community members in the most digestible way possible to start a conversation about a specific issue. The main goal of the event is to give residents who may not have technical expertise but have local experience a chance to participate in data-informed decision-making. Doing this work in-person can open doors and let facilitators ask a broader range of questions. To achieve this, the event must be designed to be inclusive of people without a background in data analysis and/or using statistics to understand local issues. Carrying out this event will let decision-makers in government use open data to talk with residents who can add to data’s value with their stories of lived experience relevant to local issues.
These events can take several forms, and groups both in and outside of government have designed creative and innovative events tailored to engage community members who are actively interested in helping solve local issues but are unfamiliar with using open data. This guide will help clarify how exactly to make Community Data Dialogues non-technical, interactive events that are inclusive to all participants….
A number of groups both in and outside of government have facilitated accessible open data events to great success. Here are just a few examples from the field of what data-focused events tailored for a nontechnical audience can look like:
Data Days Cleveland
Data Days Cleveland is an annual one-day event designed to make data accessible to all. Programs are designed with inclusivity and learning in mind, making it a more welcoming space for people new to data work. Data experts and practitioners direct novices on the fundamentals of using data: making maps, reading spreadsheets, creating data visualizations, etc….
The Urban Institute’s Data Walks
The Urban Institute’s Data Walks are an innovative example of presenting data in an interactive and accessible way to communities. Data Walks are events gathering community residents, policymakers, and others to jointly review and analyze data presentations on specific programs or issues and collaborate to offer feedback based on their individual experiences and expertise. This feedback can be used to improve current projects and inform future policies….(More)“.